Dlstm Approach To Video Modeling With Hashing For Large-Scale Video Retrieval
Abstract
Although Query-by-Example techniques based on Euclidean distance in a multidimensional feature space have proved to be effective for image databases, this approach cannot be effectively applied to video since the number of dimensions would be massive due to the richness and complexity of video data. The above issue has been addressed in two recent solutions, namely Deterministic Quantization (DQ) and Dynamic Temporal Quantization (DTQ). DQ divides the video into equal segments and extracts a visual feature vector for each segment. The bag-of-word feature is then encoded by hashing to facilitate approximate nearest neighbor search using Hamming distance. One weakness of this approach is the deterministic segmentation of video data. DTQ improves on this by using dynamic video segmentation to obtain varied-length video segments. As a result, feature vectors extracted from these video segments can better capture the semantic content of the video. To support very large video databases, it is desirable to minimize the number of segments in order to keep the size of the feature representation as small as possible. We achieve this by using only one video segment (i.e., no video data segmentation is even necessary) with even better retrieval performance. Our scheme models video using differential long short-term memory (DLSTM) recurrent neural networks and obtains a highly compact fixed-size feature representation with the output of hidden states of the DLSTM. Each of these features are further compressed by hashing them into binary bits via quantization. Experimental results based on two public data sets, UCF101 and MSRActionPairs, indicate that the proposed video modeling technique outperforms DTQ by a significant margin.
Publication Date
1-1-2016
Publication Title
Proceedings - International Conference on Pattern Recognition
Number of Pages
3222-3227
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/ICPR.2016.7900131
Copyright Status
Unknown
Socpus ID
85019103279 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85019103279
STARS Citation
Zhuang, Naifan; Ye, Jun; and Hua, Kien A., "Dlstm Approach To Video Modeling With Hashing For Large-Scale Video Retrieval" (2016). Scopus Export 2015-2019. 4387.
https://stars.library.ucf.edu/scopus2015/4387