Title
What'S Making That Sound?
Keywords
Audiovisual processing; Comparative reasoning; Multimodal analysis; Winner-take-all hash
Abstract
In this paper, we investigate techniques to localize the sound source in video made using one microphone. The visual object whose motion generates the sound is located and segmented based on the synchronization analysis of object motion and audio energy. We first apply an effective region tracking algorithm to segment the video into a number of spatial-temporal region tracks, each representing the temporal evolution of an appearance-coherent image structure (i.e., object). We then extract the motion features of each object as its average acceleration in each frame. Meanwhile, Short-term Fourier Transform is applied to the audio signal to extract audio energy feature as the audio descriptor. We further impose a nonlinear transformation on both audio and visual descriptors to obtain the audio and visual codes in a common rank correlation space. Finally, the correlation between an object and the audio signal is simply evaluated by computing the Hamming distance between the audio and visual codes generated in previous steps. We evaluate the proposed method both qualitatively and quantitatively using a number of challenging test videos. In particular, the proposed method is compared with a state-of-the-art audiovisual source localization algorithm. The results demonstrate the superior performance of the proposed algorithm in spatial-temporal localization and segmentation of audio sources in the visual domain.
Publication Date
11-3-2014
Publication Title
MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia
Number of Pages
147-156
Document Type
Article; Proceedings Paper
Personal Identifier
scopus
DOI Link
https://doi.org/10.1145/2647868.2654936
Copyright Status
Unknown
Socpus ID
84913590863 (Scopus)
Source API URL
https://api.elsevier.com/content/abstract/scopus_id/84913590863
STARS Citation
Li, Kai; Ye, Jun; and Hua, Kien A., "What'S Making That Sound?" (2014). Scopus Export 2010-2014. 8229.
https://stars.library.ucf.edu/scopus2010/8229