Energy-Based Global Ternary Image For Action Recognition Using Sole Depth Sequences

Keywords

3D action recognition; depth

Abstract

In order to efficiently recognize actions from depth sequences, we propose a novel feature, called Global Ternary Image (GTI), which implicitly encodes both motion regions and motion directions between consecutive depth frames via recording the changes of depth pixels. In this study, each pixel in GTI indicates one of the three possible states, namely positive, negative and neutral, which represents increased, decreased and same depth values, respectively. Since GTI is sensitive to the subject's speed, we obtain energy-based GTI (E-GTI) by extracting GTI from pairwise depth frames with equal motion energy. To involve temporal information among depth frames, we extract E-GTI using multiple settings of motion energy. Here, the noise can be effectively suppressed by describing E-GTIs using the Radon Transform (RT). The 3D action representation is formed as a result of feeding the hierarchical combination of RTs to the Bag of Visual Words model (BoVW). From the extensive experiments on four benchmark datasets, namely MSRAction3D, DHA, MSRGesture3D and SKIG, it is evident that the hierarchical E-GTI outperforms the existing methods in 3D action recognition. We tested our proposed approach on extended MSRAction3D dataset to further investigate and verify its robustness against partial occlusions, noise and speed.

Publication Date

12-15-2016

Publication Title

Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016

Number of Pages

47-55

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/3DV.2016.14

Socpus ID

85011260936 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85011260936

This document is currently not available here.

Share

COinS