Action Localization In Videos Through Context Walk

Abstract

This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of relative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel belonging to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This "context walk" generates a conditional distribution of an action over all the supervoxels. A Conditional Random Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between supervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches.

Publication Date

2-17-2015

Publication Title

Proceedings of the IEEE International Conference on Computer Vision

Volume

2015 International Conference on Computer Vision, ICCV 2015

Number of Pages

3280-3288

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/ICCV.2015.375

Socpus ID

84973931629 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84973931629

This document is currently not available here.

Share

COinS