On anonymizing query logs via token-based hashing
- 8 May 2007
- proceedings article
- Published by Association for Computing Machinery (ACM)
- p. 629-638
- https://doi.org/10.1145/1242572.1242657
Abstract
In this paper we study the privacy preservation properties of aspecific technique for query log anonymization: token-based hashing. In this approach, each query is tokenized, and then a secure hash function is applied to each token. We show that statistical techniques may be applied to partially compromise the anonymization. We then analyze the specific risks that arise from these partial compromises, focused on revelation of identity from unambiguous names, addresses, and so forth, and the revelation of facts associated with an identity that are deemed to be highly sensitive. Our goal in this work is two fold: to show that token-based hashing is unsuitable for anonymization, and to present a concrete analysis of specific techniques that may be effective in breaching privacy, against which other anonymization schemes should be measured.Keywords
This publication has 21 references indexed in Scilit:
- Optimizing result prefetching in web search engines with segmented indicesACM Transactions on Internet Technology, 2004
- A taxonomy of web searchACM SIGIR Forum, 2002
- From e-sex to e-commerce: Web search changesComputer, 2002
- Characteristics of question format web queries: an exploratory studyInformation Processing & Management, 2002
- A user-centered approach to evaluating human interaction with Web search engines: an exploratory studyInformation Processing & Management, 2002
- Real life, real users, and real needs: a study and analysis of user queries on the webInformation Processing & Management, 2000
- Analysis of a very large web search engine query logACM SIGIR Forum, 1999
- Measures of distributional similarityPublished by Association for Computational Linguistics (ACL) ,1999
- Real life information retrieval: a study of user queries on the WebACM SIGIR Forum, 1998
- Distributional clustering of English wordsPublished by Association for Computational Linguistics (ACL) ,1993