CONSTRAINT-BASED PROTEIN TOPOLOGY PATTERN SEARCHING
Dr David Gilbert, Department of Computer Science, City University and European Bioinformatics Institute, Hinxton, Cambridge
Dr David Westhead, EMBL Outstation, European Bioinformatics Institute, Hinxton, Cambridge
Abstract: We present a formal language, incorporating constraints over finite domains, for describing TOPS protein topology cartoons (TOPS is a simple language which captures the complex 3D structures of proteins). We also present an efficient algorithm to match a pattern to a set of TOPS diagrams. An implementation has been tested on a database derived from all the current entries in the Protein Data Bank (14,000 domains). The system can be used to define a library of motifs against which a new structure can be matched; alternatively users can define their own search patterns. In future we plan to use the system as a basis for similarity searches, as well as to learn a common pattern for a given set of diagrams. The entire system has been written using clp(fd) and is accessible over the Web.
This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 7 May 1998