Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees
Open Access
- 7 July 2011
- journal article
- research article
- Published by Oxford University Press (OUP) in Briefings in Bioinformatics
- Vol. 12 (5) , 423-435
- https://doi.org/10.1093/bib/bbr034
Abstract
Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of ‘Gold standard’ phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.Keywords
This publication has 33 references indexed in Scilit:
- OMA 2011: orthology inference among 1000 complete genomesNucleic Acids Research, 2010
- PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology ConsortiumNucleic Acids Research, 2009
- Ensembl's 10th yearNucleic Acids Research, 2009
- eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotationsNucleic Acids Research, 2009
- InParanoid 7: new algorithms and tools for eukaryotic orthology analysisNucleic Acids Research, 2009
- The Universal Protein Resource (UniProt) in 2010Nucleic Acids Research, 2009
- Phylogeny.fr: robust phylogenetic analysis for the non-specialistNucleic Acids Research, 2008
- Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hitsNucleic Acids Research, 2006
- MUSCLE: multiple sequence alignment with high accuracy and high throughputNucleic Acids Research, 2004
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992