GENEFINDING AND PROTEIN STRUCTURE PREDICTION WITH HIDDEN MARKOV MODELS
Professor Haussler, University of California at Santa Cruz
and Isaac Newton Institute for Mathematical Sciences, Cambridge
Abstract: With the Human Genome Project and other model organism genome sequencing projects now in full swing, databases of DNA, RNA and protein sequences are growing at an explosive rate, and the need for statistical/computational methods for biosequence analysis has become acute. Right now we need effective methods for locating genes in DNA sequences, along with their splice sites and regulatory binding sites, and for classifying new proteins to detect weak homologies to known proteins and predict their possible functions. Tools available for this analysis range from simple and general text search methods to detailed protein folding models of the type used in protein threading and ab initio protein structure prediction. Hidden Markov Models (HMMs) lie somewhere in the middle of this spectrum. They are computationally efficient enough for use with large databases, yet flexible enough to be used in constructing specific, detailed statistical models of the sequence variation within a particular protein family, or within a family of related DNA binding sites. We will describe what HMMs are and how they are used in biosequence analysis. Then we will report how they performed in comparison to other methods in the CASP2 international test of protein structure prediction methods.
This seminar was held at the Department of Computer Science, Royal Holloway, University of London on 10 December 1997.