Anonymizing Classification Data for Privacy Preservation

Top Cited Papers

26 March 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering

Vol. 19 (5) , 711-725
https://doi.org/10.1109/tkde.2007.1015

Abstract

Classification is a fundamental problem in data analysis. Training a classifier requires accessing a large collection of data. Releasing person-specific data, such as customer data or patient records, may pose a threat to an individual's privacy. Even after removing explicit identifying information such as Name and SSN, it is still possible to link released records back to their identities by matching some combination of nonidentifying attributes such as {Sex, Zip, Birthdate}. A useful approach to combat such linking attacks, called k-anonymization, is anonymizing the linking attributes so that at least k released records match each value combination of the linking attributes. Previous work attempted to find an optimal k-anonymization that minimizes some data distortion metric. We argue that minimizing the distortion to the training data is not relevant to the classification goal that requires extracting the structure of predication on the "future" data. In this paper, we propose a k-anonymization solution for classification. Our goal is to find a k-anonymization, not necessarily optimal in the sense of minimizing data distortion, which preserves the classification structure. We conducted intensive experiments to evaluate the impact of anonymization on the classification on future data. Experiments on real-life data show that the quality of classification can be preserved even for highly restrictive anonymity requirements

Keywords

This publication has 19 references indexed in Scilit:

Handicapping attacker's confidence: an alternative to k-anonymization
Knowledge and Information Systems, 2006
(α, k)-anonymity
Published by Association for Computing Machinery (ACM) ,2006
Anonymizing sequential releases
Published by Association for Computing Machinery (ACM) ,2006
Workload-aware anonymization
Published by Association for Computing Machinery (ACM) ,2006
Incognito
Published by Association for Computing Machinery (ACM) ,2005
On the complexity of optimal K-anonymity
Published by Association for Computing Machinery (ACM) ,2004
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
Transforming data to satisfy privacy constraints
Published by Association for Computing Machinery (ACM) ,2002
Protecting respondents identities in microdata release
IEEE Transactions on Knowledge and Data Engineering, 2001
A Mathematical Theory of Communication
Bell System Technical Journal, 1948