Hierarchical multi-label prediction of gene function
Top Cited Papers
Open Access
- 12 January 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 22 (7) , 830-836
- https://doi.org/10.1093/bioinformatics/btk048
Abstract
Motivation: Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions. Results: We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis. Supplementary information: Results for the 105 selected GO classes and predictions for 1059 unknown genes are available at: Contact:ogt@cs.princeton.eduKeywords
This publication has 26 references indexed in Scilit:
- A statistical framework for genomic data fusionBioinformatics, 2004
- Transcriptional Remodeling in Response to Iron Deprivation inSaccharomyces cerevisiaeMolecular Biology of the Cell, 2004
- A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic DataScience, 2003
- Global analysis of protein localization in budding yeastNature, 2003
- Genome-wide Analysis of Gene Expression Regulated by the Calcineurin/Crz1p Signaling Pathway in Saccharomyces cerevisiaeJournal of Biological Chemistry, 2002
- Learning Gene Functional Classifications from Multiple Data TypesJournal of Computational Biology, 2002
- The origin recognition complex: from simple origins to complex functionsGenes & Development, 2002
- Genomic Expression Responses to DNA-damaging Agents and the Regulatory Role of the Yeast ATR Homolog Mec1pMolecular Biology of the Cell, 2001
- Comprehensive Identification of Cell Cycle–regulated Genes of the YeastSaccharomyces cerevisiaeby Microarray HybridizationMolecular Biology of the Cell, 1998
- Bagging predictorsMachine Learning, 1996