Challenges in Integrating Biological Data Sources
- 1 January 1995
- journal article
- research article
- Published by Mary Ann Liebert Inc in Journal of Computational Biology
- Vol. 2 (4) , 557-572
- https://doi.org/10.1089/cmb.1995.2.557
Abstract
Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies. Key words: molecular biology databases, database integration, database federation, database transformation.Keywords
This publication has 18 references indexed in Scilit:
- Theoretical aspects of schema mergingPublished by Springer Nature ,2005
- Representing extended entity-relationship structures in relational databasesACM Transactions on Database Systems, 1992
- Interoperability of multiple autonomous databasesACM Computing Surveys, 1990
- Semantics and implementation of schema evolution in object-oriented databasesACM SIGMOD Record, 1987
- IFO: a formal semantic database modelACM Transactions on Database Systems, 1987
- Restructuring hierarchical database objectsTheoretical Computer Science, 1986
- Relative Information Capacity of Simple Relational Database SchemataSIAM Journal on Computing, 1986
- A Methodology for Data Schema Integration in the Entity Relationship ModelIEEE Transactions on Software Engineering, 1984
- View Definition and Generalization for Database Integration in a Multidatabase SystemIEEE Transactions on Software Engineering, 1984
- Database description with SDMACM Transactions on Database Systems, 1981