# Show Reference: "Multiple Active Speaker Localization based on Audio-visual Fusion in two Stages"

Multiple Active Speaker Localization based on Audio-visual Fusion in two Stages In 2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) (13-15 September 2012), pp. 262-268 by Zhao Li, Thorsten Herfet, Thorsten Thormählen
@article{li-et-al-2012,
author = {Li, Zhao and Herfet, Thorsten and Thorm\"{a}hlen, Thorsten},
booktitle = {2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)},
day = {13-15},
keywords = {auditory, calibration, computational, cue-combination, face-detection, localization, visual, visual-processing},
location = {Hamburg, Germany},
month = sep,
pages = {262--268},
posted-at = {2012-09-21 07:56:58},
priority = {2},
title = {Multiple Active Speaker Localization based on Audio-visual Fusion in two Stages},
year = {2012}
}


Li et al. report that, in their experiment, audio-visual active speaker localization is as good as visual active-speaker localization ($\sim 1^\circ$) as long as speakers are within the visual field.
Outside of the visual field, localization varies between $1^\circ$ and $10^\circ$. The authors do not report provide a detailed quantitative evaluation of localization accuracy.