A Solution to the Problem of Linking Multivariate Documents

Abstract
In many scientific investigations, it is desired to bring together, or link, two or more documents which represent the same individual, even though these documents do not contain a unique identifier and were derived from different sources. In medical and public health research and elsewhere, this problem is known as the document linkage problem. This paper considers some aspects of classifying pairs of documents into one of two populations when their items are identifying information, where each item of information can take on three distinct values correct, incorrect or missing. Section 1 identifies three document linkage problems. Sections 2 and 3 deal with the mathematical formulation of the multivariate document linkage problem. Section 4 gives the classification procedure and Section 5 deals with the application of the theory to a problem in the field of public health.

This publication has 13 references indexed in Scilit: