Analysis of protein sequence and interaction data for candidate disease gene prediction
Open Access
- 4 October 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (19) , e130
- https://doi.org/10.1093/nar/gkl707
Abstract
Linkage analysis is a successful procedure to associate diseases with specific genomic regions. These regions are often large, containing hundreds of genes, which make experimental methods employed to identify the disease gene arduous and expensive. We present two methods to prioritize candidates for further experimental study: Common Pathway Scanning (CPS) and Common Module Profiling (CMP). CPS is based on the assumption that common phenotypes are associated with dysfunction in proteins that participate in the same complex or pathway. CPS applies network data derived from protein-protein interaction (PPI) and pathway databases to identify relationships between genes. CMP identifies likely candidates using a domain-dependent sequence similarity approach, based on the hypothesis that disruption of genes of similar function will lead to the same phenotype. Both algorithms use two forms of input data: known disease genes or multiple disease loci. When using known disease genes as input, our combined methods have a sensitivity of 0.52 and a specificity of 0.97 and reduce the candidate list by 13-fold. Using multiple loci, our methods successfully identify disease genes for all benchmark diseases with a sensitivity of 0.84 and a specificity of 0.63. Our combined approach prioritizes good candidates and will accelerate the disease gene discovery process.Keywords
This publication has 42 references indexed in Scilit:
- Integration of text- and data-mining using ontologies successfully selects disease gene candidatesNucleic Acids Research, 2005
- BIND: the Biomolecular Interaction Network DatabaseNucleic Acids Research, 2003
- A similarity-based method for genome-wide prediction of disease-relevant human genesBioinformatics, 2002
- Beyond Mendel: an evolving view of human genetic disease transmissionNature Reviews Genetics, 2002
- Protein domain identification and improved sequence similarity searching using PSI‐BLASTProteins-Structure Function and Bioinformatics, 2002
- Association of genes to genetically inherited diseases using data miningNature Genetics, 2002
- Comparative assessment of large-scale data sets of protein–protein interactionsNature, 2002
- MINT: a Molecular INTeraction databaseFEBS Letters, 2001
- Human disease genesNature, 2001
- Gene Ontology: tool for the unification of biologyNature Genetics, 2000