Anonymization of administrative billing codes with repeated diagnoses through censoring.
- 13 November 2010
- journal article
- Vol. 2010, 782-6
Abstract
Patient-specific data from electronic medical records (EMRs) is increasingly shared in a de-identified form to support research. However, EMRs are susceptible to noise, error, and variation, which can limit their utility for reuse. One way to enhance the utility of EMRs is to record the number of times diagnosis codes are assigned to a patient when this data is shared. This is, however, challenging because releasing such data may be leveraged to compromise patients' identity. In this paper, we present an approach that, to the best of our knowledge, is the first that can prevent re-identification through repeated diagnosis codes. Our method transforms records to preserve privacy while retaining much of their utility. Experiments conducted using 2676 patients from the EMR system of the Vanderbilt University Medical Center verify that our method is able to retain an average of 95.4% of the diagnosis codes in a common data sharing scenario.This publication has 9 references indexed in Scilit:
- The disclosure of diagnosis codes can breach research participants' privacyJournal of the American Medical Informatics Association, 2010
- Anonymization of electronic medical records for validating genome-wide association studiesProceedings of the National Academy of Sciences, 2010
- Robust Replication of Genotype-Phenotype Associations across Multiple Diseases in an Electronic Medical RecordAmerican Journal of Human Genetics, 2010
- Evaluating re-identification risks with respect to the HIPAA privacy ruleJournal of the American Medical Informatics Association, 2010
- Development of a Large-Scale De-Identified DNA Biobank to Enable Personalized MedicineClinical Pharmacology & Therapeutics, 2008
- The NCBI dbGaP database of genotypes and phenotypesNature Genetics, 2007
- Identifying Diagnostic Errors in Primary Care Using an Electronic Screening AlgorithmArchives of internal medicine (1960), 2007
- Evaluating Common De-Identification Heuristics for Personal Health InformationJournal of Medical Internet Research, 2006
- An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Roadmap for the FutureJournal of the American Medical Informatics Association, 2004