Automatic Action Annotation In Weakly Labeled Videos

Keywords

Action annotation; Generalized maximum clique graph; Weakly supervised

Abstract

Manual spatio-temporal annotation of human actions in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem using shape, global and fine-grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Our method can also annotate multiple instances of the same action in a video. We have validated our approach on three challenging action datasets: UCF-Sport, sub-JHMDB and THUMOS13 and have obtained promising results compared to several baseline methods. Moreover, action detection experiments using annotations obtained by our method and several baselines demonstrate the superiority of our approach.

Publication Date

8-1-2017

Publication Title

Computer Vision and Image Understanding

Volume

161

Number of Pages

77-86

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.cviu.2017.05.005

Socpus ID

85019980086 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85019980086

This document is currently not available here.

Share

COinS