Title

Aggregate Nonparametric Safety Analysis Of Traffic Zones

Keywords

Action recognition; Bag of video words; Diffusion Maps; Pointwise Mutual Information; Semantic visual vocabulary

Abstract

Efficient modeling of actions is critical for recognizing human actions. Recently, bag of video words (BoVW) representation, in which features computed around spatiotemporal interest points are quantized into video words based on their appearance similarity, has been widely and successfully explored. The performance of this representation however, is highly sensitive to two main factors: the granularity, and therefore, the size of vocabulary, and the space in which features and words are clustered, i.e., the distance measure between data points at different levels of the hierarchy. The goal of this paper is to propose a representation and learning framework that addresses both these limitations. We present a principled approach to learning a semantic vocabulary from a large amount of video words using Diffusion Maps embedding. As opposed to flat vocabularies used in traditional methods, we propose to exploit the hierarchical nature of feature vocabularies representative of human actions. Spatiotemporal features computed around interest points in videos form the lowest level of representation. Video words are then obtained by clustering those spatiotemporal features. Each video word is then represented by a vector of Pointwise Mutual Information (PMI) between that video word and training video clips, and is treated as a mid-level feature. At the highest level of the hierarchy, our goal is to further cluster the mid-level features, while exploiting semantically meaningful distance measures between them. We conjecture that the mid-level features produced by similar video sources (action classes) must lie on a certain manifold. To capture the relationship between these features, and retain it during clustering, we propose to use diffusion distance as a measure of similarity between them. The underlying idea is to embed the mid-level features into a lower-dimensional space, so as to construct a compact yet discriminative, high level vocabulary. Unlike some of the supervised vocabulary construction approaches and the unsupervised methods such as pLSA and LDA, Diffusion Maps can capture local relationship between the mid-level features on the manifold. We have tested our approach on diverse datasets and have obtained very promising results. © 2011 Elsevier Inc. All rights reserved.

Publication Date

3-1-2012

Publication Title

Accident Analysis and Prevention

Volume

45

Issue

3

Number of Pages

317-325

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1016/j.aap.2011.07.019

Socpus ID

84856112825 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84856112825

This document is currently not available here.

Share

COinS