Iterative Automated Record Linkage Using Mixture Models
- 1 March 2001
- journal article
- Published by Taylor & Francis in Journal of the American Statistical Association
- Vol. 96 (453) , 32-41
- https://doi.org/10.1198/016214501750332956
Abstract
The goal of record linkage is to link quickly and accurately records that correspond to the same person or entity. Whereas certain patterns of agreements and disagreements on variables are more likely among records pertaining to a single person than among records for different people, the observed patterns for pairs of records can be viewed as arising from a mixture of matches and nonmatches. Mixture model estimates can be used to partition record pairs into two or more groups that can be labeled as probable matches (links) and probable nonmatches (nonlinks). A method is proposed and illustrated that uses marginal information in the database to select mixture models, identifies sets of records for clerks to review based on the models and marginal information, incorporates clerically reviewed data, as they become available, into estimates of model parameters, and classifies pairs as links, nonlinks, or in need of further clerical review. The procedure is illustrated with five datasets from the U.S. Bureau ...Keywords
This publication has 22 references indexed in Scilit:
- A Method for Calibrating False-Match Rates in Record LinkageJournal of the American Statistical Association, 1995
- The Use of Names for Linking Personal Records: CommentJournal of the American Statistical Association, 1992
- Linear Logistic Latent Class Analysis for Polytomous DataJournal of the American Statistical Association, 1992
- Mixed Markov Latent Class ModelsSociological Methodology, 1990
- Latent Structure Models with Direct Effects between IndicatorsSociological Methods & Research, 1988
- A Stabilized Newton-Raphson Algorithm for Log-Linear Models for Frequency Tables Derived by Indirect ObservationSociological Methodology, 1988
- Product Models for Frequency Tables Involving Indirect ObservationThe Annals of Statistics, 1977
- Log-Linear Models for Frequency Tables Derived by Indirect Observation: Maximum Likelihood EquationsThe Annals of Statistics, 1974
- Algorithm AS 51: Log-Linear Fit for Contingency TablesJournal of the Royal Statistical Society Series C: Applied Statistics, 1972
- Record linkageCommunications of the ACM, 1962