A Solution to the Problem of Linking Multivariate Documents
- 1 March 1969
- journal article
- research article
- Published by Taylor & Francis in Journal of the American Statistical Association
- Vol. 64 (325) , 163-174
- https://doi.org/10.1080/01621459.1969.10500961
Abstract
In many scientific investigations, it is desired to bring together, or link, two or more documents which represent the same individual, even though these documents do not contain a unique identifier and were derived from different sources. In medical and public health research and elsewhere, this problem is known as the document linkage problem. This paper considers some aspects of classifying pairs of documents into one of two populations when their items are identifying information, where each item of information can take on three distinct values correct, incorrect or missing. Section 1 identifies three document linkage problems. Sections 2 and 3 deal with the mathematical formulation of the multivariate document linkage problem. Section 4 gives the classification procedure and Section 5 deals with the application of the theory to a problem in the field of public health.Keywords
This publication has 13 references indexed in Scilit:
- Outcome Probabilities for a Record Matching Process with Complete Invariant InformationJournal of the American Statistical Association, 1967
- A document linkage program for digital computersBehavioral Science, 1965
- Person-matching by electronic methodsCommunications of the ACM, 1962
- Some Classification Problems with Multivariate Qualitative DataBiometrics, 1961
- Techniques for discriminant analysis with discrete variablesMetrika, 1959
- Automatic Linkage of Vital RecordsScience, 1959
- On the Problem of Matching Lists by SamplesJournal of the American Statistical Association, 1959
- A General Theory of Discrimination When the Information About Alternative Population Distributions is Based on SamplesThe Annals of Mathematical Statistics, 1954
- On the Analysis of Samples from $k$ ListsThe Annals of Mathematical Statistics, 1952
- A Solution to the Problem of Optimum ClassificationThe Annals of Mathematical Statistics, 1949