ENABLING INTEGRATIVE GENOMIC ANALYSIS OF HIGH-IMPACT HUMAN DISEASES THROUGH TEXT MINING
Open Access
- 1 December 2007
- proceedings article
- Published by World Scientific Pub Co Pte Ltd in Pacific Symposium on Biocomputing
Abstract
Our limited ability to perform large-scale translational discovery and analysis of disease characterizations from public genomic data repositories remains a major bottleneck in efforts to translate genomics experiments to medicine. Through comprehensive, integrative genomic analysis of all available human disease characterizations we gain crucial insight into the molecular phenomena underlying pathogenesis as well as intra-and inter-disease differentiation. Such knowledge is crucial in the development of improved clinical diagnostics and the identification of molecular targets for novel therapeutics. In this study we build on our previous work to realize the next important step in large-scale translational discovery and analysis, which is to automatically identify those genomic experiments in which a disease state is compared to a normal control state. We present an automated text mining method that employs Natural Language Processing (NLP) techniques to automatically identify disease-related experiments in the NCBI Gene Expression Omnibus (GEO) that include measurements for both disease and normal control states. In this manner, we find that 62% of disease-related experiments contain sample subsets that can be automatically identified as normal controls. Furthermore, we calculate that the identified experiments characterize diseases that contribute to 30% of all human disease-related mortality in the United States. This work demonstrates that we now have the necessary tools and methods to initiate large-scale translational bioinformatics inquiry across the broad spectrum of high-impact human disease.Keywords
This publication has 18 references indexed in Scilit:
- A Novel Hybrid Approach to Automated Negation Detection in Clinical Radiology ReportsJournal of the American Medical Informatics Association, 2007
- Systematic identification of human mitochondrial disease genes through integrative genomicsNature Genetics, 2006
- Translational and Clinical Science — Time for a New VisionNew England Journal of Medicine, 2005
- An integrative genomics approach to infer causal associations between gene expression and diseaseNature Genetics, 2005
- NCBI GEO: mining millions of expression profiles--database and toolsNucleic Acids Research, 2004
- Funding high-throughput data sharingNature Biotechnology, 2004
- Progress in the use of microarray technology to study the neurobiology of diseaseNature Neuroscience, 2004
- ArrayExpress--a public repository for microarray gene expression data at the EBINucleic Acids Research, 2003
- Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLSJournal of the American Medical Informatics Association, 2001
- Accessing Genetic Information with High-Density DNA ArraysScience, 1996