Learning Informative Pairwise Joints With Energy-Based Temporal Pyramid For 3D Action Recognition

Keywords

Action recognition; Skeleton sequence

Abstract

This paper presents an effective local spatial-temporal descriptor for action recognition from skeleton sequences. The unique property of our descriptor is that it takes the spatial-temporal discrimination and action speed variations into account, intending to solve the problems of distinguishing similar actions and identifying actions with different speeds in one goal. The entire algorithm consists of two stages. First, a frame selection method is used to remove noisy skeletons for a given skeleton sequence. From the selected skeletons, skeleton joints are mapped to a high dimensional space, where each point refers to kinematics, time label and joint label of a skeleton joint. To encode relative relationships among joints, pairwise points from the space are then jointly mapped to a new space, where each point encodes the relative relationships of skeleton joints. Second, Fisher Vector (FV) is employed to encode all points from the new space as a compact feature representation. To cope with speed variations in actions, an energy-based temporal pyramid is applied to form a multi-temporal FV representation, which is fed into a kernel-based extreme learning machine classifier for recognition. Extensive experiments on benchmark datasets consistently show that our method outperforms state-of-the-art approaches for skeleton-based action recognition.

Publication Date

8-28-2017

Publication Title

Proceedings - IEEE International Conference on Multimedia and Expo

Number of Pages

901-906

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ICME.2017.8019313

Socpus ID

85030229475 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85030229475

This document is currently not available here.

Share

COinS