Which are the best identifiers for record linkage?
- 1 January 2004
- journal article
- research article
- Published by Taylor & Francis in Medical Informatics and the Internet in Medicine
- Vol. 29 (3-4) , 221-227
- https://doi.org/10.1080/14639230400005974
Abstract
As a linkage using less informative identifiers could lead to linkage errors, it is essential to quantify the information associated to each identifier. The aim of this study was to estimate the discriminating power of different identifiers susceptible to be used in a record linkage process. This work showed the interest of three identifiers when linking data concerning a same patient using an automatic procedure based on the method proposed by Jaro; the date of birth, the first and the last names seemed to be the more appropriate identifiers. Including a poorly discriminating identifier like gender did not improve the results. Moreover, adding a second christian name, often missing, increased linkage errors. On the contrary, it seemed that using a phonetic treatment adapted to the French language could improve the results of linkage in comparison to the Soundex. However, whatever, the method used it seems necessary to improve the quality of identifier collection as it could greatly influence linkage results.Keywords
This publication has 7 references indexed in Scilit:
- Probabilistic Record Linkage: Relationships between File Sizes, Identifiers, and Match WeightsMethods of Information in Medicine, 2001
- Automatic Record Hash Coding and Linkage for Epidemiological Follow-up Data ConfidentialityMethods of Information in Medicine, 1998
- Effects of record linkage errors on registry-based follow-up studiesStatistics in Medicine, 1997
- Improving American Indian cancer data in the Washington State Cancer Registry using linkages with the Indian Health Service and tribal recordsCancer, 1996
- Determinants of Homonym and Synonym Rates of Record Linkage in Disease RegistrationMethods of Information in Medicine, 1996
- Probabilistic linkage of large public health data filesStatistics in Medicine, 1995