Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem.
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Open Access)
Soomro, Khurram, "Online, Supervised and Unsupervised Action Localization in Videos" (2017). Electronic Theses and Dissertations. 5681.