Predicting gene function in a hierarchical context with an ensemble of classifiers
Open Access
- 27 June 2008
- journal article
- research article
- Published by Springer Nature in Genome Biology
- Vol. 9 (S1) , S3
- https://doi.org/10.1186/gb-2008-9-s1-s3
Abstract
Background: The wide availability of genome-scale data for several organisms has stimulated interest in computational approaches to gene function prediction. Diverse machine learning methods have been applied to unicellular organisms with some success, but few have been extensively tested on higher level, multicellular organisms. A recent mouse function prediction project (MouseFunc) brought together nine bioinformatics teams applying a diverse array of methodologies to mount the first large-scale effort to predict gene function in the laboratory mouse. Results: In this paper, we describe our contribution to this project, an ensemble framework based on the support vector machine that integrates diverse datasets in the context of the Gene Ontology hierarchy. We carry out a detailed analysis of the performance of our ensemble and provide insights into which methods work best under a variety of prediction scenarios. In addition, we applied our method to Saccharomyces cerevisiae and have experimentally confirmed functions for a novel mitochondrial protein. Conclusion: Our method consistently performs among the top methods in the MouseFunc evaluation. Furthermore, it exhibits good classification performance across a variety of cellular processes and functions in both a multicellular organism and a unicellular organism, indicating its ability to discover novel biology in diverse settings.This publication has 36 references indexed in Scilit:
- A critical assessment of Mus musculusgene function prediction using integrated genomic evidenceGenome Biology, 2008
- Inparanoid: a comprehensive database of eukaryotic orthologsNucleic Acids Research, 2004
- InterPro, progress and status in 2005Nucleic Acids Research, 2004
- Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disordersNucleic Acids Research, 2004
- A Yeast Mitochondrial Membrane Methyltransferase-like Protein Can Compensate for oxa1 MutationsJournal of Biological Chemistry, 2004
- Protein network inference from multiple genomic data: a supervised approachBioinformatics, 2004
- The functional landscape of mouse gene expressionJournal of Biology, 2004
- Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiaeNucleic Acids Research, 2004
- PROGRAM DESCRIPTIONGenomics, 2001
- Tetrazolium Overlay Technique for Population Studies of Respiration Deficiency in YeastScience, 1957