Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies
Open Access
- 1 August 2008
- journal article
- Published by Oxford University Press (OUP) in Systematic Biology
- Vol. 57 (4) , 613-627
- https://doi.org/10.1080/10635150802306527
Abstract
Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.Keywords
This publication has 44 references indexed in Scilit:
- Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic GenomesPLOS ONE, 2007
- Evaluation of clustering algorithms for protein-protein interaction networksBMC Bioinformatics, 2006
- ProtTest: selection of best-fit models of protein evolutionBioinformatics, 2005
- Does a tree–like phylogeny only exist at the tips in the prokaryotes?Proceedings Of The Royal Society B-Biological Sciences, 2004
- Genome evolution in yeastsNature, 2004
- rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase PhylogenyJournal of Molecular Evolution, 2002
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000
- Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic AnalysisMolecular Biology and Evolution, 2000
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997