Show Tag: integration

Select Other Tags

There are a number of approaches for audio-visual localization. Some with actual robots, some just as theoretical ANN or algorithmic models.

In a sensorimotor synchronization task, Aschersleben and Bertelson found that an auditory distractor biased the temporal perception of a visual target stimulus more strongly than the other way around.

In the Simon task, subjects are required to respond to a stimulus with a response that is spatially congruent or incongruent to that stimulus: They have, for example, to press a button with the left hand in response to a stimulus which is presented either on the left or on the right. Congruent responses (stimulus on the left—respond by pressing a button with the left hand) is usually faster than an incongruent response.

Kushal et al. do not evaluate the accuracy of audio-visual localization quantitatively. They do show a graph for visual-only, audio-visual, and audio-visual and temporal localization during one test run. That graph seems to indicate that multisensory and temporal integration prevent misdetections—they do not seem to improve localization much.

Kushal et al. use an EM algorithm to integrate audio-visual information for active speaker localization statically and over time.

Studies on audio-visual active speaker localization usually do not report on in-depth evaluations of audio-visual localization accuracy. The reason is, probably, that auditory information is only used as a backup for cases when visual localization fails or for disambiguation in case visual information is not sufficient to tell which of the visual targets is the active speaker.

When visual detection succeeds, it is usually precise enough.

Therefore, active speaker localization is probably a misnomer. It should be called active speaker identification.

In the experiment by Xu et al., SC neurons in cats that were raised with congruent audio-visual stimuli distinguished between disparate combined stimuli, even if these stimuli were both in the neurons' receptive fields. Xu et al. state that this is different in naturally reared cats.

In the the experiment by Xu et al., SC neurons in cats that were raised with congruent audio-visual stimuli had a preferred time difference between onset of visual and auditory stimuli of 0s whereas this is around 50-100ms in normal cats.

In the the experiment by Xu et al., SC neurons in cats reacted best to auditory and visual stimuli that resembled those they were raised with (small flashing spots, broadband noise bursts), however, they generalized and reacted similarly to other stimuli.

In the study due to Xu et al., multi-sensory enhancement in specially-raised cats decreased gradually with distance between uni-sensory stimuli instead of occurring if and only if stimuli were present in their RFs. This is different from cats that are raised normally in which enhancement occurs regardless of stimulus distance if both uni-sensory components both are within their RF.

Enhancement in the SC happens only between stimuli from different modalities.

Depression in the SC happens between stimuli from the same modality.

Is there really no enhancement between different cues from the same modalities, like eg. contrast and color?

Patton and Anastasio present a model of "enhancement and modality-specific suppression in multi-sensory neurons" that requires no multiplicative interaction. It is a follow-up of their earlier functional model of these neurons which requires complex computation.

Anastasio et al. present a model of the response properties of multi-sensory SC neurons which explains enhancement, depression, and super-addititvity using Bayes' rule: If one assumes that a neuron integrates its input to infer the posterior probability of a stimulus source being present in its receptive field, then these effects arise naturally.

Anastasio et al.'s model of SC neurons assumes that these neurons receive multiple inputs with Poisson noise and apply Bayes' rule to calculate the posterior probability of a stimulus being in their receptive fields.

Functional segregation and integration are complementary principles of organization of the brain.