Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions

Authors

    Authors

    Y. Yang; I. Saleemi;M. Shah

    Comments

    Authors: contact us about adding a copy of your work at STARS@ucf.edu

    Abbreviated Journal Title

    IEEE Trans. Pattern Anal. Mach. Intell.

    Keywords

    Human actions; one-shot learning; unsupervised clustering; gestures; facial expressions; action representation; action recognition; motion; primitives; motion patterns; histogram of motion primitives; motion; primitives strings; Hidden Markov model; RECOGNITION; Computer Science, Artificial Intelligence; Engineering, Electrical &; Electronic

    Abstract

    This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one-shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.

    Journal Title

    Ieee Transactions on Pattern Analysis and Machine Intelligence

    Volume

    35

    Issue/Number

    7

    Publication Date

    1-1-2013

    Document Type

    Article

    Language

    English

    First Page

    1635

    Last Page

    1648

    WOS Identifier

    WOS:000319060600008

    ISSN

    0162-8828

    Share

    COinS