DataLink Record Linkage Software Applied to the Cancer Registry of Murcia, Spain
- 1 January 2008
- journal article
- Published by Georg Thieme Verlag KG in Methods of Information in Medicine
- Vol. 47 (05) , 448-453
- https://doi.org/10.3414/me0529
Abstract
Summary Objectives: Record linkage between data sets is relatively simple when unique, universal, permanent, and common variables exist in each data set. This situation occurs infrequently; thus, there is a need to apply probabilistic methods to identify corresponding records. DataLink has been tested to determine if the use of clustering techniques will improve performance with a minimum decrease in accuracy. Methods: The study uses cancer registry data which includes hospital discharge and pathology reports from two hospitals in the Murcia Region for the years 2002-2003. These data are standardized prior to running DataLink. The original version of DataLink compares all of the records one by one, and in two later versions of the software clustering is applied which filters for one or more variables. Computing time and the proportion of detected matches have been investigated with each version. Results: The clustering versions achieve 96.1% and 96.2% accuracy, respectively. An improvement in the computational time of 97.3% and 98.6% is achieved for the two clustering versions compared with the original. The clustering versions lose 0.36% and 1.07% of real duplicates, respectively. Conclusions: DataLink implements deterministic and probabilistic record linkage to eliminate duplicates and to merge new information with existing cases. The standardization of variables to a common format has been adapted to the characteristics of Spanish language data. Clustering techniques minimize computational time and maximize accuracy in the detection of corresponding records.Keywords
This publication has 9 references indexed in Scilit:
- Validity of self reported diagnoses of cancer in a major Spanish prospective cohort studyJournal of Epidemiology and Community Health, 2006
- Record Linkage in the Cancer Registry of Tyrol, AustriaMethods of Information in Medicine, 2005
- Practical introduction to record linkage for injury researchInjury Prevention, 2004
- Adaptive Filtering for Efficient Record LinkagePublished by Society for Industrial & Applied Mathematics (SIAM) ,2004
- Buscando una aguja en un pajar: las técnicas de conexión de registros en los sistemas de información sanitariaMedicina Clinica, 2004
- TAILOR: a record linkage toolboxPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Efficient clustering of high-dimensional data sets with application to reference matchingPublished by Association for Computing Machinery (ACM) ,2000
- Integration of heterogeneous databases without common domains using queries based on textual similarityACM SIGMOD Record, 1998
- Record linkageCommunications of the ACM, 1962