Combining Multiple Data Sets in a Likelihood Analysis: Which Models are the Best?
Open Access
- 1 December 2002
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 19 (12) , 2294-2307
- https://doi.org/10.1093/oxfordjournals.molbev.a004053
Abstract
Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular data sets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model), assuming that branch lengths are proportional among genes (proportional model), or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino acid data sets, our results suggest that, depending on the data set chosen, either the separate model or the proportional model represents the most appropriate method for branch length analysis. For all the data sets examined, one gamma parameter for each gene represents the best model for among-site rate variation. Using these models we analyzed alternative mammalian tree topologies, and we describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.Keywords
This publication has 30 references indexed in Scilit:
- A structural EM algorithm for phylogenetic inferencePublished by Association for Computing Machinery (ACM) ,2001
- Parallel adaptive radiations in two major clades of placental mammalsNature, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Molecular Evidence of an African Phiomorpha–South American Caviomorpha Clade and Support for Hystricognathi Based on the Complete Mitochondrial Genome of the Cane Rat (Thryonomys swinderianus)Molecular Phylogenetics and Evolution, 2001
- Model Selection and InferencePublished by Springer Nature ,1998
- Combining data in phylogenetic analysisTrends in Ecology & Evolution, 1996
- A Likelihood Ratio Test to Detect Conflicting Phylogenetic SignalSystematic Biology, 1996
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoideaJournal of Molecular Evolution, 1989
- Evolutionary trees from DNA sequences: A maximum likelihood approachJournal of Molecular Evolution, 1981