Automatic Action Annotation In Weakly Labeled Videos
Keywords
Action annotation; Generalized maximum clique graph; Weakly supervised
Abstract
Manual spatio-temporal annotation of human actions in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized Maximum Clique Graph problem using shape, global and fine-grained similarity of proposals across the videos. The output of our method is the most action representative proposals from each video. Our method can also annotate multiple instances of the same action in a video. We have validated our approach on three challenging action datasets: UCF-Sport, sub-JHMDB and THUMOS13 and have obtained promising results compared to several baseline methods. Moreover, action detection experiments using annotations obtained by our method and several baselines demonstrate the superiority of our approach.
Publication Date
8-1-2017
Publication Title
Computer Vision and Image Understanding
Volume
161
Number of Pages
77-86
Document Type
Article
Personal Identifier
scopus
DOI Link
https://doi.org/10.1016/j.cviu.2017.05.005
Copyright Status
Unknown
Socpus ID
85019980086 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85019980086
STARS Citation
Sultani, Waqas and Shah, Mubarak, "Automatic Action Annotation In Weakly Labeled Videos" (2017). Scopus Export 2015-2019. 6201.
https://stars.library.ucf.edu/scopus2015/6201