Structural analysis based on state‐space modeling
Open Access
- 1 March 1993
- journal article
- research article
- Published by Wiley in Protein Science
- Vol. 2 (3) , 305-314
- https://doi.org/10.1002/pro.5560020302
Abstract
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson's taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a “smoothing” algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method's utility, we present its application to flavodoxin, a prototypical α/β protein having a central β-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.Keywords
Funding Information
- TASC, NSF (DIR-8715633)
This publication has 24 references indexed in Scilit:
- Tertiary templates for proteins: Use of packing criteria in the enumeration of allowed sequences for different structural classesPublished by Elsevier ,2005
- Selection of representative protein data setsProtein Science, 1992
- Crystal structure of thioredoxin from Escherichia coli at 1.68 Å resolutionJournal of Molecular Biology, 1990
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Solvation energy in protein folding and bindingNature, 1986
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresBiopolymers, 1983
- Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteinsJournal of Molecular Biology, 1978
- Structure of the semiquinone form of flavodoxin from Clostridium MPJournal of Molecular Biology, 1977
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977
- Chemical and biological evolution of a nucleotide-binding proteinNature, 1974