Querying multiple bioinformatics information sources
- 1 December 2002
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGMOD Record
- Vol. 31 (4) , 59-64
- https://doi.org/10.1145/637411.637421
Abstract
Advances in Semantic Web and Ontologies have pushed the role of semantics to a new frontier: Semantic Composition of Web Services. A good example of such compositions is the querying of multiple bioinformatics data sources. Supporting effective querying over a large collection of bioinformatics data sources presents a number of unique challenges. First, queries over bioinformatics data sources are often complex associative queries over multiple Web documents. Most associations are defined by string matching of textual fragments in two documents. Second, most of the queries required by Genomics researchers involve complex data extraction, and sophisticated workflows that implement the complex associative access. Third but not the least, complex Genomics-specific queries are often reused many times by Genomics researchers, either directly or through some refinements, and are considered as a part of the research results by Genomics researchers. In this short article we present a list of challenging issues in supporting effective querying over bioinformatics data sources and illustrate them through a selection of representative search scenarios provided by biologists. We end the article with a discussion on how the state-of-art research and technological development in Semantic Web, Ontology, Internet Data Management, and Internet Computing Systems can help addressing these issues.Keywords
This publication has 7 references indexed in Scilit:
- IntelliGEN: A Distributed Workflow System for Discovering Protein-Protein InteractionsDistributed and Parallel Databases, 2003
- GenBankNucleic Acids Research, 2002
- Transparent access to multiple bioinformatics information sourcesIBM Systems Journal, 2001
- Deciphering the mammalian stress response – a stressful taskOncogene, 1999
- Scriptable Access to the Caenorhabditis elegans Genome Sequence and Other ACEDB DatabasesGenome Research, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence dataNucleic Acids Research, 1995