Title
Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event Recounting Systems
Abstract
We describe the Raytheon BBN Technologies (BBN) led VISER system for the TRECVID 2013 Multimedia Event Detection (MED) and Recounting (MER) tasks. We present a comprehensive analysis of the different modules: (1) a large suite of visual, audio and multimodal low-level features; (2) video- and segment-level semantic scene/action/object concepts; (3) automatic speech recognition (ASR); (4) videotext detection and recognition (OCR). For the low-level features, we used multiple static, motion-based, color, and audio features and Fisher Vector (FV) representation. For the semantic concepts, we developed various visual concept sets in addition to multiple existing visual concept banks. In particular, we used BBN's natural language processing (NLP) technologies to automatically identify and train salient concepts from short textual descriptions of research set videos. We also exploited online data resources to augment the concept banks. For the speech and videotext content, we leveraged rich confidence-weighted keywords and phrases obtained from the ASR and OCR systems. We combined these different streams using multiple early (feature-level) and late (score-level) fusion strategies. Our system involves both SVM-based and query-based detections, to achieve superior performance despite of the varying number of positive videos in the event kit. We present a thorough study of different semantic feature based systems compared to low-level feature based systems. Consistent with previous MED evaluations, low-level features still exhibit strong performance. Further, our semantic feature based systems have improved significantly, and produce gains in fusion, especially in the EK10 and EK0 conditions. On the pre-specified condition, the mean average precision (MAP) of our VISER system are 33%, 16.6% and 5.2% for the EK100, EK10 and EK0 conditions respectively. These are largely consistent with our ad hoc results that are 32.2%, 14.3% and 8.1% for the EK100, EK10 and EK0 conditions respectively. For the MER task, our system has an accuracy of 64.96% and takes only 52.83% of the video length for the evaluators to analyze the evidence and make their judgment.
Publication Date
1-1-2013
Publication Title
2013 TREC Video Retrieval Evaluation, TRECVID 2013
Number of Pages
-
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
Copyright Status
Unknown
Socpus ID
85085787595 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/85085787595
STARS Citation
Natarajan, Pradeep; Wu, Shuang; Luisier, Florian; Zhuang, Xiaodan; and Tickoo, Manasvi, "Bbn Viser Trecvid 2013 Multimedia Event Detection And Multimedia Event Recounting Systems" (2013). Scopus Export 2010-2014. 7465.
https://stars.library.ucf.edu/scopus2010/7465