An en masse phenotype and function prediction system for Mus musculus
Open Access
- 27 June 2008
- journal article
- Published by Springer Nature in Genome Biology
- Vol. 9 (S1) , S8
- https://doi.org/10.1186/gb-2008-9-s1-s8
Abstract
Background: Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation. Results: Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources. Conclusion: We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.Keywords
This publication has 44 references indexed in Scilit:
- Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiaegene functionGenome Biology, 2008
- A critical assessment of Mus musculusgene function prediction using integrated genomic evidenceGenome Biology, 2008
- InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scaleGenome Biology, 2007
- Loss ofVav2Proto-Oncogene Causes Tachycardia and Cardiovascular Disease in MiceMolecular Biology of the Cell, 2007
- Expanded protein information at SGD: new pages and proteome browserNucleic Acids Research, 2007
- The mouse genome database (MGD): new features facilitating a model systemNucleic Acids Research, 2006
- FlyBase: genomes by the dozenNucleic Acids Research, 2006
- WormBase: new content and better accessNucleic Acids Research, 2006
- Pfam: clans, web tools and servicesNucleic Acids Research, 2006
- A gene atlas of the mouse and human protein-encoding transcriptomesProceedings of the National Academy of Sciences, 2004