Approximate matchings in scientific databases
- 1 January 1994
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 3, 448-457
- https://doi.org/10.1109/hicss.1994.323327
Abstract
Organizations often need access to scientific data stored in independently managed databases. In this paper, we analyze the data heterogeneity problem which occurs when the data conveying the same or similar information is represented differently in different databases. We introduce the matching join to process queries in scientific databases and discuss the three steps to evaluate it. First we transform the query using the functional dependencies in the database to incorporate additional knowledge. Second, we use rules and weights to compare the attributes. Matching joins can also be used to obtain approximate answers. In the third step, we propose a numeric measure, called the comparison value, c, to estimate the quality of matching and suggest deterministic and probabilistic ways of deriving it. Finally, we analyze the problem of estimating the cutoff value for c that would minimize the cost of errors during the join computation.Keywords
This publication has 15 references indexed in Scilit:
- Key equivalence in heterogeneous databasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Rule based joins in heterogeneous databasesDecision Support Systems, 1995
- Predicate migrationPublished by Association for Computing Machinery (ACM) ,1993
- Entity identification in database integrationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993
- Data manipulation in heterogeneous databasesACM SIGMOD Record, 1991
- Summary of the final report of the NSF workshop on scientific database managementACM SIGMOD Record, 1990
- New techniques for best-match retrievalACM Transactions on Information Systems, 1990
- Data structures and organisation: Special problems in scientific applicationsComputer Physics Communications, 1989
- A theory of attributed equivalence in databases with application to schema integrationIEEE Transactions on Software Engineering, 1989
- View Definition and Generalization for Database Integration in a Multidatabase SystemIEEE Transactions on Software Engineering, 1984