Temporal Domain Neural Encoder For Video Representation Learning
Abstract
We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.
Publication Date
8-22-2017
Publication Title
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume
2017-July
Number of Pages
2192-2199
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CVPRW.2017.272
Copyright Status
Unknown
Socpus ID
85030257104 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85030257104
STARS Citation
Hu, Hao; Wang, Zhaowen; Lee, Joon Young; Lin, Zhe; and Qi, Guo Jun, "Temporal Domain Neural Encoder For Video Representation Learning" (2017). Scopus Export 2015-2019. 7087.
https://stars.library.ucf.edu/scopus2015/7087