Phylogenomic inference of protein molecular function: advances and challenges
- 22 January 2004
- journal article
- conference paper
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 20 (2) , 170-179
- https://doi.org/10.1093/bioinformatics/bth021
Abstract
Motivation: Protein families evolve a multiplicity of functions through gene duplication, speciation and other processes. As a number of studies have shown, standard methods of protein function prediction produce systematic errors on these data. Phylogenomic analysis-combining phylogenetic tree construction, integration of experimental data and differentiation of orthologs and paralogs-has been proposed to address these errors and improve the accuracy of functional classification. The explicit integration of structure prediction and analysis in this framework, which we call structural phylogenomics, provides additional insights into protein superfamily evolution. Results: Results of protein functional classification using phylogenomic analysis show fewer expected false positives overall than when pairwise methods of functional classification are employed. We present an overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks (when available) and suggest procedures to increase accuracy. We also discuss some of the methods used in the Celera Genomics high-throughput phylogenomic classification of the human genome.Keywords
This publication has 18 references indexed in Scilit:
- SATCHMO: sequence alignment and tree construction using hidden Markov modelsBioinformatics, 2003
- Systematic Phylogenomic Evidence of en Bloc Duplication of the Ancestral 8p11.21-8p21.3-like RegionMolecular Biology and Evolution, 2003
- Phylogeny estimation: traditional and Bayesian approachesNature Reviews Genetics, 2003
- A Phylogenomic Investigation of CYCLOIDEA-Like TCP Genes in the LeguminosaePlant Physiology, 2003
- A Phylogenomic Approach to Bacterial Phylogeny: Evidence of a Core of Genes Sharing a Common HistoryGenome Research, 2002
- Phylogenetic Analysis and Gene Functional Predictions: Phylogenomics in ActionTheoretical Population Biology, 2002
- A simple algorithm to infer gene duplication and speciation events on a gene treeBioinformatics, 2001
- The Sequence of the Human GenomeScience, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000