Scopus Export 2010-2014

Multimodal Analysis For Identification And Segmentation Of Moving-Sounding Objects

Hamid Izadinia, University of Central Florida
Imran Saleemi, University of Central Florida
Mubarak Shah, University of Central Florida

Keywords

Audio-visual analysis; Audio-visual synchronization; Canonical correlation analysis; Video segmentation

Abstract

In this paper, we propose a novel method that exploits correlation between audio-visual dynamics of a video to segment and localize objects that are the dominant source of audio. Our approach consists of a two-step spatiotemporal segmentation mechanism that relies on velocity and acceleration of moving objects as visual features. Each frame of the video is segmented into regions based on motion and appearance cues using the QuickShift algorithm, which are then clustered over time using K-means, so as to obtain a spatiotemporal video segmentation. The video is represented by motion features computed over individual segments. The Mel-Frequency Cepstral Coefficients (MFCC) of the audio signal, and their first order derivatives are exploited to represent audio. The proposed framework assumes there is a non-trivial correlation between these audio features and the velocity and acceleration of the moving and sounding objects. The canonical correlation analysis (CCA) is utilized to identify the moving objects which are most correlated to the audio signal. In addition to moving-sounding object identification, the same framework is also exploited to solve the problem of audio-video synchronization, and is used to aid interactive segmentation. We evaluate the performance of our proposed method on challenging videos. Our experiments demonstrate significant increase in performance over the state-of-the-art both qualitatively and quantitatively, and validate the feasibility and superiority of our approach. © 1999-2012 IEEE.

Publication Date

1-28-2013

Publication Title

IEEE Transactions on Multimedia

Volume

Issue

Number of Pages

378-390

Document Type

Article

Personal Identifier

scopus

DOI Link

https://doi.org/10.1109/TMM.2012.2228476

Copyright Status

Unknown

Socpus ID

84872703797 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84872703797

STARS Citation

Izadinia, Hamid; Saleemi, Imran; and Shah, Mubarak, "Multimodal Analysis For Identification And Segmentation Of Moving-Sounding Objects" (2013). Scopus Export 2010-2014. 6604.
https://stars.library.ucf.edu/scopus2010/6604

This document is currently not available here.

COinS

Scopus Export 2010-2014

Multimodal Analysis For Identification And Segmentation Of Moving-Sounding Objects

Keywords

Abstract

Publication Date

Publication Title

Volume

Issue

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2010-2014

Multimodal Analysis For Identification And Segmentation Of Moving-Sounding Objects

Creator

Keywords

Abstract

Publication Date

Publication Title

Volume

Issue

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect