Show Reference: "Robust sound localization using multi-source audiovisual information fusion"

Robust sound localization using multi-source audiovisual information fusion Information Fusion, Vol. 2, No. 3. (September 2001), pp. 209-223, doi:10.1016/s1566-2535(01)00035-5 by Parham Aarabi, Safwat Zaky
    abstract = {This paper illustrates the synergic advantages of a multi-modal sound localization system utilizing two cameras and a 3-element microphone array. The two cameras were used as part of a stereo feature-detection based visual object localization system, while the microphones were combined to produce a sound localization system incorporating a temporal power fusion ({TPF}) algorithm. The cameras and microphones were integrated using spatial likelihood functions ({SLFs}), which greatly simplifies the integration process. Test results show a significant improvement in the integrated vision and sound localization ({IVSL}) system's ability over that of the stand-alone microphone-array based sound localization system to accurately localize sound sources in low signal to noise situations. The {IVSL} system maintained an average error of 15 cm at signal-to-noise ratios as low as 0.5 {dB}.},
    author = {Aarabi, Parham and Zaky, Safwat},
    citeulike-article-id = {4778580},
    citeulike-linkout-0 = {},
    doi = {10.1016/s1566-2535(01)00035-5},
    issn = {15662535},
    journal = {Information Fusion},
    month = sep,
    number = {3},
    pages = {209--223},
    posted-at = {2015-02-13 10:59:22},
    priority = {2},
    title = {Robust sound localization using multi-source audiovisual information fusion},
    url = {},
    volume = {2},
    year = {2001}

See the CiteULike entry for more info, PDF links, BibTex etc.

Aarabi present a system for audio-visual localization in azimuth and depth which they demonstrate in an active-speaker localization task.

Aarabi choose (adaptive) difference images for visual localization to avoid relying on domain knowledge.

Aarabi use ITD (computed using cross-correlation) and ILD in an array of 3 microphones for auditory localization.