Temporal Domain Neural Encoder For Video Representation Learning

Abstract

We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.

Publication Date

8-22-2017

Publication Title

IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Volume

2017-July

Number of Pages

2192-2199

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CVPRW.2017.272

Socpus ID

85030257104 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85030257104

This document is currently not available here.

Share

COinS