"Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event" by Pradeep Natarajan, Shuang Wu et al.

Scopus Export 2010-2014

Title

Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event Recounting Systems

Creator

Pradeep Natarajan, BBN Technologies
Shuang Wu, BBN Technologies
Florian Luisier, BBN Technologies
Xiaodan Zhuang, BBN Technologies
Manasvi Tickoo, BBN Technologies

Abstract

We describe the Raytheon BBN Technologies (BBN) led VISER system for the TRECVID 2013 Multimedia Event Detection (MED) and Recounting (MER) tasks. We present a comprehensive analysis of the different modules: (1) a large suite of visual, audio and multimodal low-level features; (2) video- and segment-level semantic scene/action/object concepts; (3) automatic speech recognition (ASR); (4) videotext detection and recognition (OCR). For the low-level features, we used multiple static, motion-based, color, and audio features and Fisher Vector (FV) representation. For the semantic concepts, we developed various visual concept sets in addition to multiple existing visual concept banks. In particular, we used BBN's natural language processing (NLP) technologies to automatically identify and train salient concepts from short textual descriptions of research set videos. We also exploited online data resources to augment the concept banks. For the speech and videotext content, we leveraged rich confidence-weighted keywords and phrases obtained from the ASR and OCR systems. We combined these different streams using multiple early (feature-level) and late (score-level) fusion strategies. Our system involves both SVM-based and query-based detections, to achieve superior performance despite of the varying number of positive videos in the event kit. We present a thorough study of different semantic feature based systems compared to low-level feature based systems. Consistent with previous MED evaluations, low-level features still exhibit strong performance. Further, our semantic feature based systems have improved significantly, and produce gains in fusion, especially in the EK10 and EK0 conditions. On the pre-specified condition, the mean average precision (MAP) of our VISER system are 33%, 16.6% and 5.2% for the EK100, EK10 and EK0 conditions respectively. These are largely consistent with our ad hoc results that are 32.2%, 14.3% and 8.1% for the EK100, EK10 and EK0 conditions respectively. For the MER task, our system has an accuracy of 64.96% and takes only 52.83% of the video length for the evaluators to analyze the evidence and make their judgment.

Publication Date

1-1-2013

Publication Title

2013 TREC Video Retrieval Evaluation, TRECVID 2013

Number of Pages

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

Copyright Status

Unknown

Socpus ID

85085787595 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/85085787595

STARS Citation

Natarajan, Pradeep; Wu, Shuang; Luisier, Florian; Zhuang, Xiaodan; and Tickoo, Manasvi, "Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event Recounting Systems" (2013). Scopus Export 2010-2014. 7465.
https://stars.library.ucf.edu/scopus2010/7465

This document is currently not available here.

COinS

Scopus Export 2010-2014

Title

Creator

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2010-2014

Title

Creator

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect