Kushal et al. use an EM algorithm to integrate audio-visual information for active speaker localization statically and over time.