Revealing the spatial distribution of a disease while preserving privacy
- 18 November 2008
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 105 (46) , 17608-17613
- https://doi.org/10.1073/pnas.0801021105
Abstract
Datasets describing the health status of individuals are important for medical research but must be used cautiously to protect patient privacy. For patient data containing geographical identifiers, the conventional solution is to aggregate the data by large areas. This method often preserves privacy but suffers from substantial information loss, which degrades the quality of subsequent disease mapping or cluster detection studies. Other heuristic methods for de-identifying spatial patient information do not quantify the risk to individual privacy. We develop an optimal method based on linear programming to add noise to individual locations that preserves the distribution of a disease. The method ensures a small, quantitative risk of individual re-identification. Because the amount of noise added is minimal for the desired degree of privacy protection, the de-identified set is ideal for spatial epidemiological studies. We apply the method to patients in New York County, New York, showing that privacy is guaranteed while moving patients 25—150 times less than aggregation by zip code.Keywords
This publication has 17 references indexed in Scilit:
- Privacy Protection Versus Cluster Detection in Spatial EpidemiologyAmerican Journal of Public Health, 2006
- No Place to Hide — Reverse Identification of Patients from Published MapsNew England Journal of Medicine, 2006
- A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak DetectionJournal of the American Medical Informatics Association, 2006
- Confidentiality and Confidence: Is Data Aggregation a Means to Achieve Both?Journal of Public Health Policy, 2005
- Confidentiality and spatially explicit data: Concerns and challengesProceedings of the National Academy of Sciences, 2005
- Random-data perturbation techniques and privacy-preserving data miningKnowledge and Information Systems, 2005
- From Hippocrates to HIPAA: Privacy and confidentiality in Emergency Medicine—Part I: Conceptual, moral, and legal foundationsAnnals of Emergency Medicine, 2005
- k-ANONYMITY: A MODEL FOR PROTECTING PRIVACYInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
- A spatial scan statisticCommunications in Statistics - Theory and Methods, 1997
- PROTECTING CONFIDENTIALITY IN SMALL POPULATION HEALTH AND ENVIRONMENTAL STATISTICSStatistics in Medicine, 1996