Automatic Record Hash Coding and Linkage for Epidemiological Follow-up Data Confidentiality
- 1 July 1998
- journal article
- research article
- Published by Georg Thieme Verlag KG in Methods of Information in Medicine
- Vol. 37 (03) , 271-277
- https://doi.org/10.1055/s-0038-1634527
Abstract
A protocol is proposed to allow linkage of anonymous medical information within the framework of epidemiological follow-up studies. The protocol is composed of two steps; the first concerns the irreversible transformation of identification data, using a one-way hash function which is used after spelling processing. To avoid dictionary attacks, two large random files of keys, called pads, are introduced. The second step consists in the linkage of files rendered anonymous. The weight given to each linkage field is estimated by a mixture model, the likelihood of which being maximized with the Expectation and Maximization (EM) algorithm. The performance of this method has been assessed by comparing record linkage, based on exclusive use of the automatic procedure, with a manual linkage, obtained by the Burgundy Registry of Digestive Cancers. The result of the linkage of a file of 2,847 cancers with a file of 388,614 hospitalization stays in the Dijon university hospital showed a sensitivity of 97% and a specificity of 93%.Keywords
This publication has 3 references indexed in Scilit:
- Probabilistic linkage of large public health data filesStatistics in Medicine, 1995
- Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, FloridaJournal of the American Statistical Association, 1989
- A Method for Obtaining Digital Signatures and Public-Key CryptosystemsPublished by Defense Technical Information Center (DTIC) ,1978