Accuracy of estimated phylogenetic trees from molecular data
- 1 March 1983
- journal article
- Published by Springer Nature in Journal of Molecular Evolution
- Vol. 19 (2) , 153-170
- https://doi.org/10.1007/bf02300753
Abstract
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's fλ, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogentic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques.Keywords
This publication has 35 references indexed in Scilit:
- Accuracy of estimated phylogenetic trees from molecular dataJournal of Molecular Evolution, 1982
- A new estimate of sequence divergence of mitochondrial DNA using restriction endonuclease mappingsJournal of Molecular Evolution, 1979
- Inter- and intraspecific variation in restriction maps of Drosophila mitochondrial DNAsNature, 1979
- Mathematical model for studying genetic variation in terms of restriction endonucleases.Proceedings of the National Academy of Sciences, 1979
- The theory of genetic distance and evolution of human racesJournal of Human Genetics, 1978
- Construction of phylogenetic trees for proteins and nucleic acids: Empirical evaluation of alternative matrix methodsJournal of Molecular Evolution, 1978
- Standard error of immunological dating of evolutionary timeJournal of Molecular Evolution, 1977
- Effect of Migration on Genetic DistanceThe American Naturalist, 1976
- Drift variances of heterozygosity and genetic distance in transient statesGenetics Research, 1975
- Construction of Phylogenetic TreesScience, 1967