Title

Bbn Viser Trecvid 2012 Multimedia Event Detection And Multimedia Event Recounting Systems

Abstract

We describe the Raytheon BBN Technologies (BBN) led VISER system for the TRECVID 2012 Multimedia Event Detection (MED) and Recounting (MER) tasks. We present a comprehensive analysis of the different modules in our evaluation system that includes: (1) a large suite of visual, audio and multimodal low-level features, (2) modules to detect semantic scene/action/object concepts over the entire video and within short temporal spans, (3) automatic speech recognition (ASR), and (4) videotext detection and recognition (OCR). For the low-level features we used multiple static, motion, color, and audio features previously considered in literature as well as a set of novel, fast kernel based feature descriptors developed recently by BBN. For the semantic concept detection systems, we leveraged BBN's natural language processing (NLP) technologies to automatically analyze and identify salient concepts from short textual descriptions of videos and frames. Then, we trained detectors for these concepts using visual and audio features. The semantic concept based systems enable rich description of video content for event recounting (MER). The video level concepts have the most coverage and can provide robust concept detections on most videos. Segment level concepts are less robust, but can provide sequence information that enriches recounting. Object detection, ASR and OCR are sporadic in occurrence but have high precision and improves quality of the recounting. For the MED task, we combined these different streams using multiple early/feature level and late/score level fusion strategies. We present a rigorous analysis of each of these subsystems and the impact of different fusion strategies. In particular, we present a thorough study of different semantic feature based systems compared to low-level feature based systems considered in most MED systems. Consistent with previous MED evaluations, low-level features exhibit strong performance. Further, semantic feature based systems have comparable performance to the low-level system, and produce gains in fusion. Overall, BBN's primary submission has an average missed detection rate of 29.6% with a false alarm rate of 2.6%. One of BBN's contrastive runs has <50% missed detection and <4% false alarm rates for all twenty events.

Publication Date

1-1-2012

Publication Title

2012 TREC Video Retrieval Evaluation Notebook Papers

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Socpus ID

84905245222 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84905245222

This document is currently not available here.

Share

COinS