"What'S Making That Sound?" by Kai Li, Jun Ye et al.

Scopus Export 2010-2014

Title

What'S Making That Sound?

Creator

Kai Li, University of Central Florida
Jun Ye, University of Central Florida
Kien A. Hua, University of Central Florida

Keywords

Audiovisual processing; Comparative reasoning; Multimodal analysis; Winner-take-all hash

Abstract

In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We first apply an effective region tracking algorithm to segment the video into a number of spatial-temporal region tracks, each representing the temporal evolution of an appearance-coherent image structure (i.e., object). We then extract the motion features of each object as its average acceleration in each frame. Meanwhile, Short-term Fourier Transform is applied to the audio signal to extract audio energy feature as the audio descriptor. We further impose a nonlinear transformation on both audio and visual descriptors to obtain the audio and visual codes in a common rank correlation space. Finally, the correlation between an object and the audio signal is simply evaluated by computing the Hamming distance between the audio and visual codes generated in previous steps. We evaluate the proposed method both qualitatively and quantitatively using a number of challenging test videos. In particular, the proposed method is compared with a state-of-the-art audiovisual source localization algorithm. The results demonstrate the superior performance of the proposed algorithm in spatial-temporal localization and segmentation of audio sources in the visual domain.

Publication Date

11-3-2014

Publication Title

MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia

Number of Pages

147-156

Document Type

Article; Proceedings Paper

Personal Identifier

scopus

DOI Link

https://doi.org/10.1145/2647868.2654936

Copyright Status

Unknown

Socpus ID

84913590863 (Scopus)

Source API URL

https://api.elsevier.com/content/abstract/scopus_id/84913590863

STARS Citation

Li, Kai; Ye, Jun; and Hua, Kien A., "What'S Making That Sound?" (2014). Scopus Export 2010-2014. 8229.
https://stars.library.ucf.edu/scopus2010/8229

This document is currently not available here.

COinS

Scopus Export 2010-2014

Title

Creator

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Explore

Connect

Scopus Export 2010-2014

Title

Creator

Keywords

Abstract

Publication Date

Publication Title

Number of Pages

Document Type

Personal Identifier

DOI Link

Copyright Status

Socpus ID

Source API URL

STARS Citation

Share

Explore

Connect