Show Reference: "Computational Audiovisual Scene Analysis in Online Adaptation of Audio-Motor Maps"

Computational Audiovisual Scene Analysis in Online Adaptation of Audio-Motor Maps IEEE Transactions on Autonomous Mental Development, Vol. 5, No. 4. (December 2013), pp. 273-287, doi:10.1109/tamd.2013.2257766 by Rujiao Yan, Tobias Rodemann, Britta Wrede
    author = {Yan, Rujiao and Rodemann, Tobias and Wrede, Britta},
    citeulike-article-id = {13508424},
    citeulike-linkout-0 = {},
    citeulike-linkout-1 = {\_all.jsp?arnumber=6514501},
    doi = {10.1109/tamd.2013.2257766},
    institution = {Res. Inst. for Cognition \& Robot., Bielefeld Univ., Bielefeld, Germany},
    issn = {1943-0604},
    journal = {IEEE Transactions on Autonomous Mental Development},
    keywords = {active-speaker-localization, audio, localization, robotic, visual},
    month = dec,
    number = {4},
    pages = {273--287},
    posted-at = {2015-02-04 10:25:48},
    priority = {2},
    publisher = {IEEE},
    title = {Computational Audiovisual Scene Analysis in Online Adaptation of {Audio-Motor} Maps},
    url = {},
    volume = {5},
    year = {2013}

See the CiteULike entry for more info, PDF links, BibTex etc.

Yan et al. present a system which uses auditory and visual information to learn an audio-motor map (in a functional sense) and orient a robot towards a speaker. Learning is online.

Yan et al. use the standard Viola-Jones face detection algorithm for visual processing.

Yan et al. explicitly do not integrate auditory and visual localization. Given multiple visual and an auditory localization, they associate the auditory localization with that visual localization which is closest, using the visual localization as the localization of the audio-visual object.

In determining the position of the audio-visual object, Yan et al. handle the possibility that the actual source of the stimulus has only been heard, not seen. They decide whether that is the case by estimating the probability that the auditory localization belongs to any of the detected visual targets and comparing to the baseline probability that the auditory target has not been detected, visually.

Yan et al. do not evaluate the accuracy of audio-visual localization.

Yan et al. report an accuracy of auditory localization of $3.4^\circ$ for online learning and $0.9^\circ$ for offline calibration.

Yan et al. perform sound source localization using both ITD and ILD. Some of their auditory processing is bio-inspired.