Versatile Query Scrambling for Private Web Search
We consider the problem of privacy leaks suffered by Internet users when they perform web searches, and propose a framework to mitigate them. In brief, given a ‘sensitive’ search query, the objective of our work is to retrieve the target documents from a search engine without disclosing the actual query. Our approach, which builds upon and improves recent work on search privacy, approximates the target search results by replacing the private user query with a set of blurred or scrambled queries. The results of the scrambled queries are then used to cover the private user interest. We model the problem theoretically, define a set of privacy objectives with respect to web search and investigate the effectiveness of the proposed solution with a set of queries with privacy issues on a large web collection. Experiments show great improvements in retrieval effectiveness over a previously reported baseline in the literature. Furthermore, the methods are more versatile, predictably-behaved, applicable to a wider range of information needs, and the privacy they provide is more comprehensible to the end-user. Additionally, we investigate the perceived privacy via a user study, as well as, measure the system’s usefulness taking into account the trade off between retrieval effectiveness and privacy. The practical feasibility of the methods is demonstrated in a field experiment, scrambling queries against a popular web search engine. The findings may have implications for other IR research areas, such as query expansion, query decomposition, and distributed retrieval.