Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment
- 8 October 1993
- journal article
- research article
- Published by American Association for the Advancement of Science (AAAS) in Science
- Vol. 262 (5131) , 208-214
- https://doi.org/10.1126/science.8211139
Abstract
A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model forNsequences inN-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helix-turn-helix proteins, lipocalins, and prenyltransferases.Keywords
This publication has 102 references indexed in Scilit:
- Statistics of local complexity in amino acid sequences and sequence databasesPublished by Elsevier ,2001
- Crystal structure of the factor for inversion stimulation FIS at 2.0 Å resolutionJournal of Molecular Biology, 1992
- Motif recognition and alignment for many sequences by comparison of dot-matricesJournal of Molecular Biology, 1991
- Determining residue-base interactions between AraC protein and araI DNAJournal of Molecular Biology, 1989
- A method for multiple sequence alignment with gapsJournal of Molecular Biology, 1989
- The Calculation of Posterior Distributions by Data AugmentationJournal of the American Statistical Association, 1987
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globinsJournal of Molecular Biology, 1980
- Estimating the Dimension of a ModelThe Annals of Statistics, 1978
- A new look at the statistical model identificationIEEE Transactions on Automatic Control, 1974