Achieving anonymity via clustering
- 26 June 2006
- proceedings article
- Published by Association for Computing Machinery (ACM)
- Vol. 6 (3) , 153-162
- https://doi.org/10.1145/1142351.1142374
Abstract
Publishing data for analysis from a table containing personal records, while maintaining individual privacy, is a problem of increasing importance today. The traditional approach of de-identifying records is to remove identifying fields such as social security number, name etc. However, recent research has shown that a large fraction of the US population can be identified using non-key attributes (called quasi-identifiers) such as date of birth, gender, and zip code (15). Sweeney (16) proposed the k-anonymity model for privacy where non-key attributes that leak information are suppressed or general- ized so that, for every record in the modified table, there are at least k¡1 other records having exactly the same values for quasi-identifiers. We propose a new method for anonymiz- ing data records, where quasi-identifiers of data records are first clustered and then cluster centers are published. To ensure privacy of the data records, we impose the constraint that each cluster must contain no fewer than a pre-specified number of data records. This technique is more general since we have a much larger choice for cluster centers than k-Anonymity. In many cases, it lets us release a lot more information without compromising privacy. We also pro- vide constant-factor approximation algorithms to come upKeywords
This publication has 9 references indexed in Scilit:
- L-diversity: privacy beyond k-anonymityPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- IncognitoPublished by Association for Computing Machinery (ACM) ,2005
- Data Privacy through Optimal k-AnonymizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Toward Privacy in Public DatabasesPublished by Springer Nature ,2005
- On the complexity of optimal K-anonymityPublished by Association for Computing Machinery (ACM) ,2004
- k-ANONYMITY: A MODEL FOR PROTECTING PRIVACYInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002
- The Capacitated K-Center ProblemSIAM Journal on Discrete Mathematics, 2000
- How to Allocate Network CentersJournal of Algorithms, 1993
- A Best Possible Heuristic for the k-Center ProblemMathematics of Operations Research, 1985