Learning Gene Functional Classifications from Multiple Data Types
- 1 April 2002
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 9 (2) , 401-411
- https://doi.org/10.1089/10665270252935539
Abstract
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. In addition, we describe feature scaling methods for further exploiting prior knowledge of heterogeneity by giving each data type different weights.Keywords
This publication has 11 references indexed in Scilit:
- Knowledge-based analysis of microarray gene expression data by using support vector machinesProceedings of the National Academy of Sciences, 2000
- A combined algorithm for genome-wide prediction of protein functionNature, 1999
- Assigning protein functions by comparative genome analysis: Protein phylogenetic profilesProceedings of the National Academy of Sciences, 1999
- Cluster analysis and display of genome-wide expression patternsProceedings of the National Academy of Sciences, 1998
- Comprehensive Identification of Cell Cycle–regulated Genes of the YeastSaccharomyces cerevisiaeby Microarray HybridizationMolecular Biology of the Cell, 1998
- The Transcriptional Program of Sporulation in Budding YeastScience, 1998
- Nonlinear Component Analysis as a Kernel Eigenvalue ProblemNeural Computation, 1998
- A Tutorial on Support Vector Machines for Pattern RecognitionData Mining and Knowledge Discovery, 1998
- Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic ScaleScience, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997