Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences
- 1 December 1989
- journal article
- research article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 29 (6) , 526-537
- https://doi.org/10.1007/bf02602924
Abstract
Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base singlets (or doublets, or triplets, or…) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer multiplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).Keywords
This publication has 16 references indexed in Scilit:
- Identification of common molecular subsequencesPublished by Elsevier ,2004
- Algorithms for identifying local molecular sequence featuresBioinformatics, 1988
- Efficient algorithms for molecular sequence analysis.Proceedings of the National Academy of Sciences, 1988
- Linkage map of the short arm of human chromosome 11: location of the genes for catalase, calcitonin, and insulin-like growth factor II.Proceedings of the National Academy of Sciences, 1985
- A prevalent persistent global nonrandomness that distinguishes coding and non-coding eucaryotic nuclear DNA sequencesJournal of Molecular Evolution, 1983
- Numerical Methods for Inferring Evolutionary TreesThe Quarterly Review of Biology, 1982
- A Comparison of Methods for Reconstructing Evolutionary TreesSystematic Zoology, 1981
- The evolution and sequence comparison of two recently diverged mouse chromosomal β-globin genesCell, 1979
- How reliably do amino acid composition comparisons predict sequence similarities between proteins?Journal of Theoretical Biology, 1979
- A general method applicable to the search for similarities in the amino acid sequence of two proteinsJournal of Molecular Biology, 1970