A protein alignment scoring system sensitive at all evolutionary distances
- 1 March 1993
- journal article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 36 (3) , 290-300
- https://doi.org/10.1007/bf00160485
Abstract
Protein sequence alignments generally are constructed with the aid of a “substitution matrix” that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a “log-odds” matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.Keywords
This publication has 46 references indexed in Scilit:
- Amino acid substitution matrices from an information theoretic perspectivePublished by Elsevier ,2005
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Amino acid sequence of a globin from the sea cucumber Caudina (Molpadia) arenicolaBiochimica et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology, 1991
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Detecting homology of distantly related proteins with consensus sequencesJournal of Molecular Biology, 1987
- A sensitive procedure to compare amino acid sequencesJournal of Molecular Biology, 1987
- Identification of protein sequence homology by consensus template alignmentJournal of Molecular Biology, 1986
- Aligning amino acid sequences: Comparison of commonly used methodsJournal of Molecular Evolution, 1985
- Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551Journal of Molecular Biology, 1971