TECHNIQUES FOR OPTIMIZATION OF QUERIES ON INTEGRATED BIOLOGICAL RESOURCES
- 1 June 2004
- journal article
- review article
- Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology
- Vol. 2 (2) , 375-411
- https://doi.org/10.1142/s0219720004000648
Abstract
Today, scientific data are inevitably digitized, stored in a wide variety of formats, and are accessible over the Internet. Scientific discovery increasingly involves accessing multiple heterogeneous data sources, integrating the results of complex queries, and applying further analysis and visualization applications in order to collect datasets of interest. Building a scientific integration platform to support these critical tasks requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that are locally materialized in warehouses or generated by software. The lack of efficiency of existing approaches can significantly affect the process with lengthy delays while accessing critical resources or with the failure of the system to report any results. Some queries take so much time to be answered that their results are returned via email, making their integration with other results a tedious task. This paper presents several issues that need to be addressed to provide seamless and efficient integration of biomolecular data. Identified challenges include: capturing and representing various domain specific computational capabilities supported by a source including sequence or text search engines and traditional query processing; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; developing cost and semantics based decision support tools to select sources and capabilities, and to generate efficient query evaluation plans.Keywords
This publication has 29 references indexed in Scilit:
- Web data retrieval and extractionData & Knowledge Engineering, 2003
- Extending traditional query-based integration approaches for functional characterization of post-genomic dataBioinformatics, 2001
- Initial sequencing and analysis of the human genomeNature, 2001
- Scaling access to heterogeneous data sources with DISCOIEEE Transactions on Knowledge and Data Engineering, 1998
- BioKleisli: a digital library for biomedical researchersInternational Journal on Digital Libraries, 1997
- Principles of programming with complex objects and collection typesTheoretical Computer Science, 1995
- An overview of the Object Protocol Model (OPM) and the OPM data management toolsInformation Systems, 1995
- Comprehending monadsMathematical Structures in Computer Science, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- The functional data model and the data languages DAPLEXACM Transactions on Database Systems, 1981