Title

Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event Recounting Systems

Abstract

We describe the Raytheon BBN Technologies (BBN) led VISER system for the TRECVID 2013 Multimedia Event Detection (MED) and Recounting (MER) tasks. We present a comprehensive analysis of the different modules: (1) a large suite of visual, audio and multimodal low-level features; (2) video- and segment-level semantic scene/action/object concepts; (3) automatic speech recognition (ASR); (4) videotext detection and recognition (OCR). For the low-level features, we used multiple static, motion-based, color, and audio features and Fisher Vector (FV) representation. For the semantic concepts, we developed various visual concept sets in addition to multiple existing visual concept banks. In particular, we used BBN's natural language processing (NLP) technologies to automatically identify and train salient concepts from short textual descriptions of research set videos. We also exploited online data resources to augment the concept banks. For the speech and videotext content, we leveraged rich confidence-weighted keywords and phrases obtained from the ASR and OCR systems. We combined these different streams using multiple early (feature-level) and late (score-level) fusion strategies. Our system involves both SVM-based and query-based detections, to achieve superior performance despite of the varying number of positive videos in the event kit. We present a thorough study of different semantic feature based systems compared to low-level feature based systems. Consistent with previous MED evaluations, low-level features still exhibit strong performance. Further, our semantic feature based systems have improved significantly, and produce gains in fusion, especially in the EK10 and EK0 conditions. On the pre-specified condition, the mean average precision (MAP) of our VISER system are 33%, 16.6% and 5.2% for the EK100, EK10 and EK0 conditions respectively. These are largely consistent with our ad hoc results that are 32.2%, 14.3% and 8.1% for the EK100, EK10 and EK0 conditions respectively. For the MER task, our system has an accuracy of 64.96% and takes only 52.83% of the video length for the evaluators to analyze the evidence and make their judgment.

Publication Date

1-1-2013

Publication Title

2013 TREC Video Retrieval Evaluation, TRECVID 2013

Number of Pages

-

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Socpus ID

85085787595 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85085787595

This document is currently not available here.

Share

COinS