Gene Sampling Can Bias Multi-Gene Phylogenetic Inferences: The Relationship between Red Algae and Green Plants as a Case Study
Open Access
- 26 February 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Molecular Biology and Evolution
- Vol. 26 (5) , 1171-1178
- https://doi.org/10.1093/molbev/msp036
Abstract
The monophyly of Plantae including glaucophytes, red algae, and green plants (green algae plus land plants) has been recovered in recent phylogenetic analyses of large multi-gene data sets (e.g., those including >30,000 amino acid [aa] positions). On the other hand, Plantae monophyly has not been stably reconstructed in inferences from multi-gene data sets with fewer than 10,000 aa positions. An analysis of 5,216 aa positions in Nozaki et al. (Nozaki H, Iseki M, Hasegawa M, Misawa K, Nakada T, Sasaki N, Watanabe M. 2007. Phylogeny of primary photosynthetic eukaryotes as deduced from slowly evolving nuclear genes. Mol Biol Evol. 24:1592–1595.) strongly rejected the monophyly of Plantae, whereas Hackett et al. (Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Rummele SE, Bhattacharya D. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolates. Mol Biol Evol. 24:1702–1713.) robustly recovered the Plantae clade in an analysis of 6,735 aa positions. We suspected that the significant incongruity observed between the two studies was attributable to a bias generally overlooked in multi-gene phylogenetic estimation, rather than data size, taxon sampling, or methods for tree reconstruction. Although glaucophytes were excluded from our analyses due to a shortage of sequence data, we found that the recovery of a sister–group relationship between red algae and green plants primarily depends on gene sampling in phylogenetic inferences from <10,000 aa positions. Phylogenetic analyses of data sets with fewer than 10,000 aa positions, which can be prepared without large-scale sequencing (e.g., expressed sequence tag analyses), are practical in challenging various unresolved issues in eukaryotic evolution. However, our results indicate that severe biases can arise from gene sampling in multi-gene inferences from <10,000 aa positions. We also address the validity of fast-evolving gene exclusion in multi-gene phylogenetic analyses, in light of this gene sampling bias.Keywords
This publication has 32 references indexed in Scilit:
- Phylogenomics reveals a new ‘megagroup’ including most photosynthetic eukaryotesBiology Letters, 2008
- Phylogenomics Reshuffles the Eukaryotic SupergroupsPLOS ONE, 2007
- Phylogenomic Analysis Supports the Monophyly of Cryptophytes and Haptophytes and the Association of Rhizaria with ChromalveolatesMolecular Biology and Evolution, 2007
- An Empirical Assessment of Long-Branch Attraction Artefacts in Deep Eukaryotic PhylogenomicsSystematic Biology, 2005
- The New Higher Level Classification of Eukaryotes with Emphasis on the Taxonomy of ProtistsThe Journal of Eukaryotic Microbiology, 2005
- Inference of the Phylogenetic Position of Oxymonads Based on Nine Genes: Support for Metamonada and ExcavataMolecular Biology and Evolution, 2005
- Root of the Eukaryota Tree as Inferred from Combined Maximum Likelihood Analyses of Multiple Molecular Sequence DataMolecular Biology and Evolution, 2004
- The Phylogenetic Position of the Pelobiont Mastigamoeba balamuthi Based on Sequences of rDNA and Translation Elongation Factors EF‐1α and EF‐2The Journal of Eukaryotic Microbiology, 2002
- Phylogenetic Position of Blastocystis hominis and of Stramenopiles Inferred from Multiple Molecular Sequence DataThe Journal of Eukaryotic Microbiology, 2002
- A Kingdom-Level Phylogeny of Eukaryotes Based on Combined Protein DataScience, 2000