A comparison of record linkage yield for health research using different variable sets

Abstract
As part of a study on childbearing and survival, we linked records of young women with invasive breast cancer identified through three population-based cancer registries, to state birth certificate records. In Michigan prior to 1989, only maternal social security number (SSN) was available for matching; other data became available in 1989 including name, birth date, address, and infant’s surname. To examine the quality of the linkage using SSN as the sole matching criterion, we conducted two procedures using data for 1989–1994 to compare linkages identified by SSN, to linkages identified using other available variables. Linkage was conducted using a deterministic approach based on seven variables and 14 steps. In each step a string of relevant variables was created and in successive phases selected variables were substituted or removed with decreasingly stringent requirements. A manual review was done to check for accuracy. Utilizing all available variables, the linkage process yielded 793 matches (live births) among 4496 patients, 780 [98%] of which would have been identified using SSN alone. Five of seven matches identified by SSN were not confirmed by manual review. SSN appears to be fairly accurate for linkage and can be valuable for linking cancer registries to other data sources.