Structural analysis based on state‐space modeling

Open Access

1 March 1993

journal article
research article
Published by Wiley in Protein Science

Vol. 2 (3) , 305-314
https://doi.org/10.1002/pro.5560020302

Abstract

A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson's taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a “smoothing” algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method's utility, we present its application to flavodoxin, a prototypical α/β protein having a central β-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.

Keywords

Funding Information

TASC, NSF (DIR-8715633)

This publication has 24 references indexed in Scilit:

Tertiary templates for proteins: Use of packing criteria in the enumeration of allowed sequences for different structural classes
Published by Elsevier ,2005
Selection of representative protein data sets
Protein Science, 1992
Crystal structure of thioredoxin from Escherichia coli at 1.68 Å resolution
Journal of Molecular Biology, 1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
Solvation energy in protein folding and binding
Nature, 1986
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Biopolymers, 1983
Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins
Journal of Molecular Biology, 1978
Structure of the semiquinone form of flavodoxin from Clostridium MP
Journal of Molecular Biology, 1977
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977
Chemical and biological evolution of a nucleotide-binding protein
Nature, 1974