Action Recognition With Temporal Scale-Invariant Deep Learning Framework
Keywords
action recognition; CNN; LSTM
Abstract
Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale-invariant deep learning framework for action recognition, which is robust to the change of action speed. Speciï¬cally, a video is firstly split into several sub-action clips and a keyframe is selected from each sub-action clip. The spatial and motion features of the keyframe are extracted separately by two Convolutional Neural Networks (CNN) and combined in the convolutional fusion layer for learning the relationship between the features. Then, Long Short Term Memory (LSTM) networks are applied to the fused features to formulate long-term temporal clues. Finally, the action prediction scores of the LSTM network are combined by linear weighted summation. Extensive experiments are conducted on two popular and challenging benchmarks, namely, the UCF-101 and the HMDB51 Human Actions. On both benchmarks, our framework achieves superior results over the state-of-the-art methods by 93.7% on UCF-101 and 69.5% on HMDB51, respectively.
Publication Date
2-1-2017
Publication Title
China Communications
Volume
14
Issue
2
Number of Pages
163-172
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CC.2017.7868164
Copyright Status
Unknown
Socpus ID
85015159779 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85015159779
STARS Citation
Chen, Huafeng; Chen, Jun; Hu, Ruimin; Chen, Chen; and Wang, Zhongyuan, "Action Recognition With Temporal Scale-Invariant Deep Learning Framework" (2017). Scopus Export 2015-2019. 5344.
https://stars.library.ucf.edu/scopus2015/5344