In order to work with spatial information from different sensory modalities and use it for motor control, coordinate transformation must happen at some point during information processing. Pouget and Sejnowski state that in many instances such transformations are non-linear. They argue that functions describing receptive fields and neural activation can be thought of and used as basis functions for the approximation of non-linear functions such as those occurring in sensory-motor coordinate transformation. ⇒
The model proposed by Heinrich et al. builds upon the one by Hinoshita et al. It adds visual input and thus shows how learning of language may not only be grounded in perception of verbal utterances, but also in visual perception.⇒
By combining information from different senses, one can sometimes make inferences that are not possible with information from one modality alone.⇒
Some modalities can yield low-latency, unreliable information and others high-latency, reliable information.
Combining both can produce fast information which improves over time.⇒
Humans adapt to an auditory scene's reverberation and noise conditions. They use visual scene recognition to recall reverberation and noise conditions of familiar environments.⇒
A system that stores multiple trained speech recognition models for different environments and retrieves them guided by visual scene recognition has improved speech recognition in reverberated and noisy environments.⇒
SC receives input and represents all sensory modalities used in phasic orienting: vision, audition, somesthesis (haptic), nociceptic, infrared, electoceptive, magnetic, and ecolocation.⇒
Sub-threshold multisensory neurons respond directly only to one modality, however, the strength of the response is strongly influenced by input from another modality.⇒
My theory on sub-threshold multisensory neurons: they receive only inhibitory input from the modality to which they do not directly respond in case that input is outside their receptive field; they receive no excitatory input from that modality if the stimulus is inside their RF.⇒
The lateral geniculate nucleus (lgn) receives visual, auditory and higher cognitive input. According to Winston, 80% of lgn input is non-visual.⇒
Task-irrelevant cues in one modality can enhance reaction times in others—but they don't always do that. Instances of this effect have been implicated with exogenous attention.⇒
Task-irrelevant auditory cues have been found to enhance reaction times in others. visual cues, however, which cued visual localization, did not cue auditory localization.⇒
The SC is multisensory: it reacts to visual, auditory, and somatosensory stimuli. It does not only initiate gaze shifts, but also other motor behaviour.⇒
Do the parts of the sensory map in the deeper SC corresponding to peripheral visual space have better representation than in the visual superficial SC because they integrate more information; does auditory or tactile localization play a more important part in multisensory localization there?⇒
There are different, unconnected notions of multimodality.
Zhang et al. propose an unsupervised dimensionality reduction algorithm, which they call 'multi-modal'.
Their notion of multi-modality is a different notion from the one used in my work: it means that a latent, low-dimensional variable is expressed according to a multi-modal PDF.
This is can be difficult depending the transformation function mapping the high-dimensional data into low-dimensional space. Especially linear methods, like PCA will suffer from this.
The authors focus on (mostly binary) classification. In that context, multi-modality requires complex decision boundaries.⇒