Predicting The Where And What Of Actors And Actions Through Online Action Localization
Abstract
This paper proposes a novel approach to tackle the challenging problem of 'online action localization' which entails predicting actions and their locations as they happen in a video. Typically, action localization or recognition is performed in an offline manner where all the frames in the video are processed together and action labels are not predicted for the future. This disallows timely localization of actions - an important consideration for surveillance tasks. In our approach, given a batch of frames from the immediate past in a video, we estimate pose and oversegment the current frame into superpixels. Next, we discriminatively train an actor foreground model on the superpixels using the pose bounding boxes. A Conditional Random Field with superpixels as nodes, and edges connecting spatio-temporal neighbors is used to obtain action segments. The action confidence is predicted using dynamic programming on SVM scores obtained on short segments of the video, thereby capturing sequential information of the actions. The issue of visual drift is handled by updating the appearance model and pose refinement in an online manner. Lastly, we introduce a new measure to quantify the performance of action prediction (i.e. online action localization), which analyzes how the prediction accuracy varies as a function of observed portion of the video. Our experiments suggest that despite using only a few frames to localize actions at each time instant, we are able to predict the action and obtain competitive results to state-of-the-art offline methods.
Publication Date
12-9-2016
Publication Title
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume
2016-December
Number of Pages
2648-2657
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1109/CVPR.2016.290
Copyright Status
Unknown
Socpus ID
84986246311 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84986246311
STARS Citation
Soomro, Khurram; Idrees, Haroon; and Shah, Mubarak, "Predicting The Where And What Of Actors And Actions Through Online Action Localization" (2016). Scopus Export 2015-2019. 4351.
https://stars.library.ucf.edu/scopus2015/4351