Systematic Association of Genes to Phenotypes by Genome and Literature Mining
Open Access
- 5 April 2005
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Biology
- Vol. 3 (5) , e134
- https://doi.org/10.1371/journal.pbio.0030134
Abstract
One of the major challenges of functional genomics is to unravel the connection between genotype and phenotype. So far no global analysis has attempted to explore those connections in the light of the large phenotypic variability seen in nature. Here, we use an unsupervised, systematic approach for associating genes and phenotypic characteristics that combines literature mining with comparative genome analysis. We first mine the MEDLINE literature database for terms that reflect phenotypic similarities of species. Subsequently we predict the likely genomic determinants: genes specifically present in the respective genomes. In a global analysis involving 92 prokaryotic genomes we retrieve 323 clusters containing a total of 2,700 significant gene–phenotype associations. Some clusters contain mostly known relationships, such as genes involved in motility or plant degradation, often with additional hypothetical proteins associated with those phenotypes. Other clusters comprise unexpected associations; for example, a group of terms related to food and spoilage is linked to genes predicted to be involved in bacterial food poisoning. Among the clusters, we observe an enrichment of pathogenicity-related associations, suggesting that the approach reveals many novel genes likely to play a role in infectious diseases.Keywords
This publication has 60 references indexed in Scilit:
- A Cross-Genomic Approach for Systematic Mapping of Phenotypic Traits to GenesGenome Research, 2004
- Potential genomic determinants of hyperthermophilyTrends in Genetics, 2003
- Association of genes to genetically inherited diseases using data miningNature Genetics, 2002
- Pathogenicity Islands and the Evolution of MicrobesAnnual Review of Microbiology, 2000
- Detecting Protein Function and Protein-Protein Interactions from Genome SequencesScience, 1999
- Predicting function: from genes to genomes and backJournal of Molecular Biology, 1998
- The vanZ gene of Tn1546 from enterococcus faecium BM4147 confers resistance to teicoplaninGene, 1995
- Cellulase families revealed by hydrophobic cluster analysiGene, 1989
- spoIID Operon of Bacillus subtilis: Cloning and SequenceMicrobiology, 1986
- Algorithm AS 152: Cumulative Hypergeometric ProbabilitiesJournal of the Royal Statistical Society Series C: Applied Statistics, 1980