Anonymization of electronic medical records for validating genome-wide association studies
- 12 April 2010
- journal article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 107 (17) , 7898-7903
- https://doi.org/10.1073/pnas.0911686107
Abstract
Genome-wide association studies (GWAS) facilitate the discovery of genotype-phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients' standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data "as is" may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients' identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks.Keywords
This publication has 13 references indexed in Scilit:
- The disclosure of diagnosis codes can breach research participants' privacyJournal of the American Medical Informatics Association, 2010
- Progress and challenges in genome-wide association studies in humansNature, 2008
- Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping MicroarraysPLoS Genetics, 2008
- Anonymizing transaction databases for publicationPublished by Association for Computing Machinery (ACM) ,2008
- Development of a Large-Scale De-Identified DNA Biobank to Enable Personalized MedicineClinical Pharmacology & Therapeutics, 2008
- A HapMap harvest of insights into the genetics of common diseaseJournal of Clinical Investigation, 2008
- The NCBI dbGaP database of genotypes and phenotypesNature Genetics, 2007
- A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility VariantsScience, 2007
- A call for the creation of personalized medicine databasesNature Reviews Drug Discovery, 2005
- k-ANONYMITY: A MODEL FOR PROTECTING PRIVACYInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002