Show Reference: "Audio-visual localization with hierarchical topographic maps: Modeling the superior colliculus"

Audio-visual localization with hierarchical topographic maps: Modeling the superior colliculus Neurocomputing (June 2012), doi:10.1016/j.neucom.2012.05.015 by Matthew C. Casey, Athanasios Pavlou, Anthony Timotheoue
@article{casey-et-al-2012,
    abstract = {A key attribute of the brain is its ability to seamlessly integrate sensory information to form a multisensory representation of the world. In early perceptual processing, the superior colliculus ({SC}) takes a leading role in integrating visual, auditory and somatosensory stimuli in order to direct eye movements. The {SC} forms a representation of multisensory space through a layering of retinotopic maps which are sensitive to different types of stimuli. These eye-centered topographic maps can adapt to crossmodal stimuli so that the {SC} can automatically shift our gaze, moderated by cortical feedback. In this paper we describe a neural network model of the {SC} consisting of a hierarchy of nine topographic maps that combine to form a multisensory retinotopic representation of audio-visual space. Our motivation is to evaluate whether a biologically plausible model of the {SC} can localize audio-visual inputs live from a camera and two microphones. We use spatial contrast and a novel form of temporal contrast for visual sensitivity, and interaural level difference for auditory sensitivity. Results are comparable with the performance observed in cats where coincident stimuli are accurately localized, while presentation of disparate stimuli causes a significant drop in performance. The benefit of crossmodal localization is shown by adding increasing amounts of noise to the visual stimuli to the point where audio-visual localization significantly out performs visual-only localization. This work demonstrates how a novel, biologically motivated model of low level multisensory processing can be applied to practical, real-world input in real-time, while maintaining its comparability with biology.},
    author = {Casey, Matthew C. and Pavlou,  Athanasios and Timotheoue, Anthony},
    doi = {10.1016/j.neucom.2012.05.015},
    issn = {09252312},
    journal = {Neurocomputing},
    keywords = {architecture, auditory, cue-combination, enhancement, learning, localization, model, multi-modality, sc},
    month = jun,
    posted-at = {2012-07-11 09:55:09},
    priority = {2},
    title = {Audio-visual localization with hierarchical topographic maps: Modeling the superior colliculus},
    url = {http://dx.doi.org/10.1016/j.neucom.2012.05.015},
    year = {2012},
    pages = {783-810},
    volume = {15},
    number = {4}
}

See the CiteULike entry for more info, PDF links, BibTex etc.

Casey et al. use their ANN in a robotic system for audio-visual localization.

Casey et al. focus on making their system work in real time and with complex stimuli and compromise on biological realism.

In Casey et al.'s system, ILD alone is used for SSL.

In Casey et al's experiments, the two microphones are one meter apart and the stimulus is one meter away from the center between the two microphones. There is no damping body between the microphones, but at that interaural distance and distance to the stimulus, ILD should still be a good localization cue.

The superficial SC is modeled by Casey et al.'s system by two populations of center-on and center-off cells (whose receptive fields are modeled by a difference of Gaussians) and four populations of direction-sensitive cells.

Pavlou and Casey model the SC.

They use Hebbian, competitive learning to learn and topographic mapping between modalities.

They also simulate cortical input.