Scopus Export 2015-2019

Videocapsulenet: A Simplified Network For Action Detection

Kevin Duarte, University of Central Florida
Yogesh S. Rawat, University of Central Florida
Mubarak Shah, University of Central Florida

Abstract

The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem. The current action detection approaches follow a complex pipeline which involves multiple tasks such as tube proposals, optical flow, and tube classification. In this work, we present a more elegant solution for action detection based on the recently developed capsule network. We propose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive. We introduce capsule-pooling in the convolutional capsule layer to address this issue and make the voting algorithm tractable. The routing-by-agreement in the network inherently models the action representations and various action characteristics are captured by the predicted capsules. This inspired us to utilize the capsules for action localization and the class-specific capsules predicted by the network are used to determine a pixel-wise localization of actions. The localization is further improved by parameterized skip connections with the convolutional capsule layers and the network is trained end-to-end with a classification as well as localization loss. The proposed network achieves state-of-the-art performance on multiple action detection datasets including UCF-Sports, J-HMDB, and UCF-101 (24 classes) with an impressive ∼20% improvement on UCF-101 and ∼15% improvement on J-HMDB in terms of v-mAP scores.

Publication Date

1-1-2018

Publication Title

Advances in Neural Information Processing Systems

Volume

2018-December

Number of Pages

7610-7619

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Copyright Status

Unknown

Socpus ID

85064830987 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85064830987

STARS Citation

Duarte, Kevin; Rawat, Yogesh S.; and Shah, Mubarak, "Videocapsulenet: A Simplified Network For Action Detection" (2018). Scopus Export 2015-2019. 10547.
https://stars.library.ucf.edu/scopus2015/10547

This document is currently not available here.

COinS

Scopus Export 2015-2019

Videocapsulenet: A Simplified Network For Action Detection

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

Videocapsulenet: A Simplified Network For Action Detection

Creator

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect