Computer vision, machine learning, action recognition, weakly supervised, probabilistic model, microsoft kinect, latency reduction


The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning system parameters or by selecting better feature points. Instead, we show that the problem lies in the spatio-temporal cuboid volume extracted from the interest point locations. Having identified the problem, we show how improved results can be achieved by simple modifications to the cuboids. For the above method however, one requires near-perfect localization of the action within a video sequence. To achieve this objective, we present a two stage weakly supervised probabilistic model for simultaneous localization and recognition of actions in videos. Different from previous approaches, our method is novel in that it (1) eliminates the need for manual annotations for the training procedure and (2) does not require any human detection or tracking in the classification stage. The first stage of our framework is a probabilistic action localization model which extracts the most promising sub-windows in a video sequence where an action can take place. We use a non-linear classifier in the second stage of our framework for the final classification task. We show the effectiveness of our proposed model on two well known real-world datasets: UCF Sports and UCF11 datasets. iii Another application of the weakly supervised probablistic model proposed above is in the gaming environment. An important aspect in designing interactive, action-based interfaces is reliably recognizing actions with minimal latency. High latency causes the system’s feedback to lag behind and thus significantly degrade the interactivity of the user experience. With slight modification to the weakly supervised probablistic model we proposed for action localization, we show how it can be used for reducing latency when recognizing actions in Human Computer Interaction (HCI) environments. This latency-aware learning formulation trains a logistic regression-based classifier that automatically determines distinctive canonical poses from the data and uses these to robustly recognize actions in the presence of ambiguous poses. We introduce a novel (publicly released) dataset for the purpose of our experiments. Comparisons of our method against both a Bag of Words and a Conditional Random Field (CRF) classifier show improved recognition performance for both pre-segmented and online classification tasks.


If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at

Graduation Date





Tappen, Marshall


Doctor of Philosophy (Ph.D.)


College of Engineering and Computer Science


Computer Science

Degree Program

Computer Science








Release Date

December 2012

Length of Campus-only Access


Access Status

Doctoral Dissertation (Open Access)


Dissertations, Academic -- Engineering and Computer Science, Engineering and Computer Science -- Dissertations, Academic