An integrated approach to inferring gene–disease associations in humans
- 25 February 2008
- journal article
- research article
- Published by Wiley in Proteins-Structure Function and Bioinformatics
- Vol. 72 (3) , 1030-1037
- https://doi.org/10.1002/prot.21989
Abstract
One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene–disease associations based on the human protein–protein interaction network, known gene–disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene–disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously. Availability: www.phenopred.org. Proteins 2008.Keywords
Funding Information
- Indiana Genomics Initiative
- NSF (DBI-0644017)
- NIH (K22LM009135, P01AG018397)
This publication has 67 references indexed in Scilit:
- The human disease networkProceedings of the National Academy of Sciences, 2007
- Abundance of Intrinsic Disorder in Protein Associated with Cardiovascular DiseaseBiochemistry, 2006
- Reconstruction of a Functional Human Gene Network, with an Application for Prioritizing Positional Candidate GenesAmerican Journal of Human Genetics, 2006
- Gene prioritization through genomic data fusionNature Biotechnology, 2006
- Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactionsBioinformatics, 2006
- Creation and implications of a phenome-genome networkNature Biotechnology, 2006
- Towards a proteome-scale map of the human protein–protein interaction networkNature, 2005
- Predicting intrinsic disorder from amino acid sequenceProteins-Structure Function and Bioinformatics, 2003
- Development of Human Protein Reference Database as an Initial Platform for Approaching Systems Biology in HumansGenome Research, 2003
- Human disease genesNature, 2001