A Bayesian Evolutionary Distance for Parametrically Aligned Sequences
- 1 January 1996
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 3 (1) , 1-17
- https://doi.org/10.1089/cmb.1996.3.1
Abstract
There is an inherent relationship between the process of pairwise sequence alignment and the estimation of evolutionary distance. This relationship is explored and made explicit. Assuming an evolutionary model and given a specific pattern of observed base mismatches, the relative probabilities of evolution at each evolutionary distance are computed using a Bayesian framework. The mean or the median of this probability distribution provides a robust estimate of the central value. The evolutionary distance has traditionally been computed as zero for an observed homology of 20 bases with no mismatches; we prove that it is highly probable that the distance is greater than 0.01. The mean of the distribution is 0.047, which is a better estimate of the evolutionary distance. Bayesian estimates of the evolutionary distance incorporate arbitrary prior information about variable mutation rates both over time and along sequence position, thus requiring only a weak form of the molecular-clock hypothesis. The endpoints of the similarity between genomic DNA sequences are often ambiguous. The probability of evolution at each evolutionary distance can be estimated over the entire set of alignments by choosing the best alignment at each distance and the corresponding probability of duplication at that evolutionary distance. A central value of this distribution provides a robust evolutionary distance estimate. We provide an efficient algorithm for computing the parametric alignment, considering evolutionary distance as the only parameter. These techniques and estimates are used to infer the duplication history of the genomic sequence in C. elegans and in S. cerevisae. Our results indicate that repeats discovered using a single scoring matrix show a considerable bias in subsequent evolutionary distance estimates.Keywords
This publication has 24 references indexed in Scilit:
- Amino acid substitution matrices from an information theoretic perspectivePublished by Elsevier ,2005
- Issues in searching molecular sequence databasesNature Genetics, 1994
- A protein alignment scoring system sensitive at all evolutionary distancesJournal of Molecular Evolution, 1993
- Finite-state models in the alignment of macromoleculesJournal of Molecular Evolution, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- [33] Statistical methods for estimating sequence divergencePublished by Elsevier ,1990
- Maximum likelihood alignment of DNA sequencesJournal of Molecular Biology, 1986
- A measure of the similarity of sets of sequences not requiring sequence alignment.Proceedings of the National Academy of Sciences, 1986
- Optimal sequence alignmentsProceedings of the National Academy of Sciences, 1983
- Construction of Phylogenetic TreesScience, 1967