Scopus Export 2015-2019

What If We Do Not Have Multiple Videos Of The Same Action? - Video Action Localization Using Web Images

Abstract

This paper tackles the problem of spatio-temporal action localization in a video, without assuming the availability of multiple videos or any prior annotations. Action is localized by employing images downloaded from internet using action name. Given web images, we first dampen image noise using random walk and evade distracting backgrounds within images using image action proposals. Then, given a video, we generate multiple spatio-temporal action proposals. We suppress camera and background generated proposals by exploiting optical flow gradients within proposals. To obtain the most action representative proposals, we propose to reconstruct action proposals in the video by leveraging the action proposals in images. Moreover, we preserve the temporal smoothness of the video and reconstruct all proposal bounding boxes jointly using the constraints that push the coefficients for each bounding box toward a common consensus, thus enforcing the coefficient similarity across multiple frames. We solve this optimization problem using variant of two-metric projection algorithm. Finally, the video proposal that has the lowest reconstruction cost and is motion salient is used to localize the action. Our method is not only applicable to the trimmed videos, but it can also be used for action localization in untrimmed videos, which is a very challenging problem. We present extensive experiments on trimmed as well as untrimmed datasets to validate the effectiveness of the proposed approach.

Publication Date

12-9-2016

Publication Title

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volume

2016-December

Number of Pages

1077-1085

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/CVPR.2016.122

Copyright Status

Unknown

Socpus ID

84986265065 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84986265065

STARS Citation

Sultani, Waqas and Shah, Mubarak, "What If We Do Not Have Multiple Videos Of The Same Action? - Video Action Localization Using Web Images" (2016). Scopus Export 2015-2019. 4198.
https://stars.library.ucf.edu/scopus2015/4198

This document is currently not available here.

COinS

Scopus Export 2015-2019

What If We Do Not Have Multiple Videos Of The Same Action? - Video Action Localization Using Web Images

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2015-2019

What If We Do Not Have Multiple Videos Of The Same Action? - Video Action Localization Using Web Images

Creator

Abstract

Publication Date

Publication Title

Volume

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect