Exploit Every Bit: Effective Caching For High-Dimensional Nearest Neighbor Search
Keywords
caching; High dimensional data; histogram; similarity search
Abstract
High-dimensional k nearest neighbor (kNN) search has a wide range of applications in multimedia information retrieval. Existing disk-based k NN search methods incur significant I/O costs in the candidate refinement phase. In this paper, we propose to cache compact approximate representations of data points in main memory in order to reduce the candidate refinement time during k NN search. This problem raises two challenging issues: (i) which is the most effective encoding scheme for data points to support k NN search? and (ii) what is the optimal number of bits for encoding a data point? For (i), we formulate and solve a novel histogram optimization problem that decides the most effective encoding scheme. For (ii), we develop a cost model for automatically tuning the optimal number of bits for encoding points. In addition, our approach is generic and applicable to exact/approximate k NN search methods. Extensive experimental results on real datasets demonstrate that our proposal can accelerate the candidate refinement time of k NN search by at least an order of magnitude.
Publication Date
5-1-2016
Publication Title
IEEE Transactions on Knowledge and Data Engineering
Volume
28
Issue
5
Number of Pages
1175-1188
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/TKDE.2016.2515603
Copyright Status
Unknown
Socpus ID
84963792027 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84963792027
STARS Citation
Tang, Bo; Yiu, Man Lung; and Hua, Kien A., "Exploit Every Bit: Effective Caching For High-Dimensional Nearest Neighbor Search" (2016). Scopus Export 2015-2019. 2594.
https://stars.library.ucf.edu/scopus2015/2594