Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.
- 6 December 1994
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 91 (25) , 12091-12095
- https://doi.org/10.1073/pnas.91.25.12091
Abstract
We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance.Keywords
This publication has 39 references indexed in Scilit:
- Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 1993
- Performance evaluation of amino acid substitution matricesProteins-Structure Function and Bioinformatics, 1993
- The PROSITE dictionary of sites and patterns in proteins, its current statusNucleic Acids Research, 1993
- The SWISS-PROT protein sequence data bank, recent developmentsNucleic Acids Research, 1993
- Reverse gyrase: a helicase-like domain and a type I topoisomerase in the same polypeptide.Proceedings of the National Academy of Sciences, 1993
- Evolution and Taxonomy of Positive-Strand RNA Viruses: Implications of Comparative Analysis of Amino Acid SequencesCritical Reviews in Biochemistry and Molecular Biology, 1993
- Improved tools for biological sequence comparison.Proceedings of the National Academy of Sciences, 1988
- Profile analysis: detection of distantly related proteins.Proceedings of the National Academy of Sciences, 1987
- Selection of DNA binding sites by regulatory proteinsJournal of Molecular Biology, 1987
- Analysis of gene duplication repeats in the myosin rodJournal of Molecular Biology, 1983