# Show Tag: learning

Select Other Tags

The model of the SC due to Cuppini et al. reproduces development of

1. multi-sensory neurons
2. multi-sensory enhancement
3. intra-modality depression
5. inverse effectiveness

Optimal multi-sensory integration is learned (for many tasks).

Soltani and Wang propose an adaptive neural model of Bayesian inference neglecting any priors and claim that it is consistent with certain observations in biology.

Soltani and Wang propose an adaptive model of Bayesian inference with binary cues.

In their model, a synaptic weight codes for the ratio of synapses in a set which are activated vs. de-activated by the binary cue encoded in their pre-synaptic axon's activity.

The stochastic Hebbian learning rule makes the synaptic weights correctly encode log posterior probabilities and the neurons will encode reward probability correctly.

SOMs can be used as a means of learning principal manifolds.

Behrens et al. modeled learning of reward probabilities using a the model of a Bayesian learner.

Behrens et al. found that humans take into account the volatility of reward probabilities in a reinforcement learning task.

The way they took the volatility into account was qualitatively modelled by a Bayesian learner.

The theoretical accounts of multi-sensory integration due to Beck et al. and Ma et al. do not learn and leave little room for learning.

Thus, they fail to explain an important aspect of multi-sensory integration in humans.

Reward mediated learning has been demonstrated in adaptation of orienting behavior.

Possible neurological correlates of reward-mediated learning have been found.

Reward-mediated is said to be biologically plausible.

Distributed Adaptive Control is a system that can learn sensory-motor contingencies

Verschure explains that, in his DAC system, the contextual layer overrules the adaptive layer as soon as it is able to predict perception well enough.

One version of DAC uses SOMs.

Humans can learn to use the statistics of their environment to guide their visual attention.

Humans do not need to be aware of the stimulus they perceive to use them to guide their visual attention.

Humans can learn to use stimuli in one modality to guide attention in another.

Landy et al. and Beck et al. seem to imply that optimization to natural stimuli is due to evolution. I'm sure they wouldn't disagree, though, with the idea that optimization is also partly achieved through learning---as in the case of kittens reared in unnatural sensory environments.

One way of evening out distribution of SOM units in data space is using conscience': a value which increases every time a neuron is BMU and decreases whenever it isn't. High conscience values then lead to a lower likelihood of being selected as BMU.

Rucci et al. present an algorithm which performs auditory localization and combines auditory and visual localization in a common SC map. The mapping between the representations is learned using value-dependent learning.

Rucci et al.'s neural network learns how to align ICx and SC (OT) maps by means of value-dependent learning: The value signal depends on whether the target was in the fovea after a saccade.

Rucci et al.'s model of learning to combine ICx and SC maps does not take into account the point-to-point projections from SC to ICx reported later by Knudsen et al.

Adams et al. use SOM-like algorithms to model biological sensori-motor control and develop robotic sensori-motor controllers.

A SOM that is to learn continuously cannot continuously decrease neighborhood interaction width and learning rate.

It is helpful if these parameters are self-regulated, like in PSOM.

Chalk et al. hypothesize that biological cognitive agents learn a generative model of sensory input and rewards for actions.

Noise can improve convergence in clustering algorithms.

k-means is a special case of the EM algorithm

"Stochastic competitive learning behaves as a form of adaptive quantization", because the centroids being adapted distribute themselves in the data space such that they minimize the quantization error (according to the distance metric being used).

My algorithms minimize the expected error since they take into account the probability of data points (via noise properties).

The model proposed by Heinrich et al. builds upon the one by Hinoshita et al. It adds visual input and thus shows how learning of language may not only be grounded in perception of verbal utterances, but also in visual perception.

Hinoshita et al. propose a model of natural language acquisition based on a multiple-timescale recurrent artificial neural network (MTRNN).

Fitting barn owls with prisms which induce a shift in where the owls see objects in their environment leads to a shift of the map of auditory space in the optic tectum.

The shift in the auditory space map in the optic tectum of owls whose visual perception was shifted by prisms is much stronger in juvenile than in mature owls.

Letting adult owls with shifted visual spatial perception hunt mice increases the amount by which the auditory space map in the owls' optic tectum is shifted (as compared to feeding them only dead mice).

Bergan et al. offer four factors which might explain the increase in shift of the auditory space maps in owls with shifted visual spatial perception:

• Hunting represents a task in which accurate map alignment is important (owls which do not hunt presumably do not face such tasks),
• more cross-modal experience (visual and auditory stimuli from the mice),
• cross-modal experiences in phases of increased attention and arousal,
• increased importance of accurate map alignment (important for feeding).

If increased importance of accurate map alignment is what causes stronger map alignment in the optic tectum of owls that hunt than in those of owls that do not hunt (with visually displacing prisms), then that could point either

• to value-based learning in the OT
• or to a role of cognitive input to the OT (hunting owls pay more attention/are more interested in audio-visual stimuli than resting or feeding owls).

Bergan et al. show that interaction with the environment can drive multisensory learning. However, Xu et al. show that multisensory learning can also happen if there is no interaction with the multisensory world.

Lawrence et al. train different kinds of recurrent neural networks to classify sentences in grammatical or agrammatical.

Lawrence manage to train ANNs to learn grammar-like structure without them having any inbuilt representation of grammar They argue that that shows that Chomsky's assumption that humans must have inborn linguistic capabilities is unnecessary.

Hinoshita et al. define self-organization as the phenomenon of a global, coherent structure arising in a system through local interaction between its elements as opposed to through some sort of central control.

According to Hinoshita et al., recurrent neural networks are capable of self-organization.

Hinoshita et al. argue that by watching language learning in RNNs, we can learn about how the human brain might self-organize to learn language.

If an MLP fails to approximate a certain function, this can be due to

• inadequate number of hidden units (not layers),
• noise.

In principle, a three-layer feedforward network should be capable of approximating any (continuous) function.

If natural learning (and information processing) were perfect, psychology would not need to study learning (and information processing), but the environment which would determine what we learn and how we process information.

Natural learning (and information processing) is not optimal and therefore psychology needs to study it and especially its imperfections.

Rucci et al. model learning of audio-visual map alignment in the barn owl SC. In their model, projections from the retina to the SC are fixed (and visual RFs are therefore static) and connections from ICx are adapted through value-dependent learning.

It is interesting that Rucci et al. modeled map alignment in barn owls using value-based learning so long before value based learning was demonstrated in map alignment in barn owls.

In SOM learning, shrinking of the neighborhood size and decreasing update strength usually follow predefined schedules i.e. they only depend on the update step.

In the PLSOM algorithm, update strength depends on the difference between a data point and the best-matching unit's weight vector, the quantization error. A large distance, indicating a bad representation of that data point in the SOM, leads to a stronger update than a small distance. The distance is scaled relative to the largest quantization error encountered so far.

PLSOM reduces the number of parameters of the SOM algorithm from four to two.

PLSOM overreacts to outliers: data points which are very unrepresentative of the data in general will change the network more strongly than they should.

PLSOM2 addresses the problem of PLSOM overreacting to outliers.

Yan et al. present a system which uses auditory and visual information to learn an audio-motor map (in a functional sense) and orient a robot towards a speaker. Learning is online.

Yan et al. do not evaluate the accuracy of audio-visual localization.

Yan et al. report an accuracy of auditory localization of $3.4^\circ$ for online learning and $0.9^\circ$ for offline calibration.

Yan et al. perform sound source localization using both ITD and ILD. Some of their auditory processing is bio-inspired.

Weber presents a Helmholtz machine extended by adaptive lateral connections between units and a topological interpretation of the network. A Gaussian prior over the population response (a prior favoring co-activation of close-by units) and training with natural images lead to spatial self-organization and feature-selectivity similar to that in cells in early visual cortex.

Most current visual object detection methods (as of 2012) are bag-of-visual-words' approaches: features are detected in an image and those features are combined in abag of visual words'. Learning algorithms are applied to learn to classify such bags of words.

Mühling et al. present an audio-visual video concept detection system. Their system extracts visual and auditory bags of words from video data. Visual words are based on SIFT features, auditory words are formed by applying the K-Means algorithm to a Mel-Frequency Cepstral Coefficients analysis of the auditory data. Support vector machines are used for classification.

Krizhevsky et al. demonstrate that large, deep convolutional neural networks can have very good object classification performance.

Noise can be beneficial in learning.

Multisensory experience is necessary to develop normal multisensory integration.

Krasne et al. present an ANN model for fear conditioning.

The redundancy provided my multisensory input can facilitate or even enable learning.

Visual feature combinations become more salient if they are learned to be associated with reward.

It is possible that learning of saccade target selection is influenced by reward.

The question is whether this happens on the saliency- or selection side.

Anderson argues that it is not the selection process that is influenced by reward but saliency evaluation (ie. attentional priority of a stimulus).

Xu et al. stress the point that in their cat rearing experiments, multisensory integration arises although there is no reward and no goal-directed behavior connected with the stimuli.

The fact that multi-sensory integration arises without reward connected to stimuli motivates unsupervised learning approaches to SC modeling.

The precise characteristics of multi-sensory integration were shown to be sensitive to their characteristics in the experienced real world during early life.

It is interesting that multisensory integration arises in cats in experiments in which there is no goal-directed behavior connected with the stimuli as that is somewhat in contradiction to the paradigm of embodied cognition.

Xu et al. raised two groups of cats in darkness and presented one with congruent and the other with random visual and auditory stimuli. They showed that SC neurons in cats from the concruent stimulus group developed multi-sensory characteristics while the other mostly did not.

In the experiment by Xu et al., SC neurons in cats that were raised with congruent audio-visual stimuli distinguished between disparate combined stimuli, even if these stimuli were both in the neurons' receptive fields. Xu et al. state that this is different in naturally reared cats.

In the the experiment by Xu et al., SC neurons in cats that were raised with congruent audio-visual stimuli had a preferred time difference between onset of visual and auditory stimuli of 0s whereas this is around 50-100ms in normal cats.

In the the experiment by Xu et al., SC neurons in cats reacted best to auditory and visual stimuli that resembled those they were raised with (small flashing spots, broadband noise bursts), however, they generalized and reacted similarly to other stimuli.

Zhao et al. propose a model which develops perception and behavior in parallel.

Their motivation is the embodiment idea stating that perception and behavior develop in behaving animals

Disparity-selective cells in visual cortical neurons have preferred disparities of only a few degrees whereas disparity in natural environments ranges over tens of degrees.

The possible explanation offered by Zhao et al. assumes that animals actively keep disparity within a small range, during development, and therefore only selectivity for small disparity develops.

Zhao et al. present a model of joint development of disparity selectivity and vergence control.

Zhao et al.'s model develops both disparity selection and vergence control in an effort to minimize reconstruction error.

It uses a form of sparse-coding to learn to approximate its input and a variation of the actor-critic learning algorithm called natural actor critic reinforcement learning algorithm (NACREL).

The teaching signal to the NACREL algorithm is the reconstruction error of the model after the action produced by it.

Mixing Hebbian (unsupervised) learning with feedback can guide the unsupervised learning process in learning interesting, or task-relevant things.

Classical models assume that learning in cortical regions is well described in an unsupervised learning framework while learning in the basal ganglia can be modeled by reinforcement learning.

Representations in the cortex (eg. V1) develop differently depending on the task. This suggests that some sort of feedback signal might be involved and learning in the cortex is not purely unsupervised.

Some task-dependency in representations may arise from embodied learning where actions bias experiences being learned from.

Conversely, the narrow range of disparities reflected in disparaty-selective cells in visual cortex neurons might be due to goal-directed feature learning.

Unsupervised learning models have been extended with aspects of reinforcement learning.

The algorithm presented by Weber and Triesch borrows from SARSA.

SOMs can be used for preprocessing in reinforcement learning, simplifying their high-dimensional input via their winner-take-all characteristics.

However, since standard SOMs do not get any goal-dependent input, they focus on globally strongest features (statistically most predictive latent variables) and under-emphasize features which would be relevant for the task.

The model due to Weber and Triesch combines SOM- or K-Means-like learning of features with prediction error feedback as in reinforcement learning. The model is thus able to learn relevant and disregard irrelevant features.

If the goal is predictive of the input, then a purely unsupervised algorithm could take a representation of the goal as just another input.

While it is possible that the goal often is predictive of the input, some error feedback is probably necessary to tune the degree to which the algorithm can be distracted' by task-irrelevant but interesting stimuli.

Saeb et al. extend their model by a short-term memory which encodes the last action. This action memory is used to make up for noise and missing information.

Human children often react to multi-sensory stimuli faster than they do to uni-sensory stimuli. However, the latencies they exhibit up to a certain age do not violate the race model as they do in adult humans.

Multisensory integration develops after birth in many ways.

Fujita presents a supervised ANN model for learning to either generate a continuous time series from an input signal, or to generate a continuous function of the continuous integral of a time series.

Kohonen states that online learning in SOMs is less safe and slower than batch learning.

Kohonen names normalization of input dimensions as a remedy for differences in scaling between these dimensions. He does not cite another paper of his (with colleagues) in which he presents a SOM that learns this scaling.

My SOM takes care of differences in scaling between input dimensions implicitly and weights input dimensions while Kangas et al.'s SOM only learns scaling.

Kohonen discusses some of the challenges involved in using SOMs for text clustering.

• words have different importance depending on their absolute frequency,
• some words occurring very rarely or very commonly must be discarded.

It would be really interesting to see whether SOMs for text clustering can be designed whose weight vectors code for different sets of words.

The map of auditory space in the nucleus of the inferior colliculus (ICx) is calibrated by visual experience.

Semantic multisensory congruence can

• shorten reaction times,
• lower detection thresholds,
• facilitate visual perceptual learning.

RNNPB learns sequences of inputs unsupervised (self-organized).

Similar parametric bias vectors are learned by the RNNPB for similar input.

If a SOM is trained on data whose dimensionality is higher than that of the SOM's grid, mapping of data points into the grid becomes more and more non-monotonic, an effect called zebra stripes' by Kohonen.

Bishop et al.'s goal in introducing generative topographic mapping was not biological plausibility.

Generative Topographic Mapping produces PDFs for latent variables given data points.

Learning rate and neighborhood width schedule have to be chosen arbitrarily for (vanilla) SOM training.

There is no general proof for convergence for SOMs.

There is no cost function that SOM learning follows.

A SOM population code in response to some input vector is not a probability density function.

The SOM algorithm works, but it has some serios theoretical problems.

GTM was developed in part to be a good alternative for the SOM algorithm.

In implementing GTM for some specific use case, one chooses a noise model (at least one per data dimension).

GTM, at least in its original formulation, is a batch algorithm.

GTM uses the EM algorithm to fit adaptive parameters $\mathbf{W}$ and $\beta$ of a constrained mixture of Gaussian model to the data.

The constrained mixture of Gaussian model consists of a set $\{\mathbf{x}_i\}$ of points in latent space which are mapped via a general linear model $\mathbf{W}\phi(x)$ into data space, and the inverse variance $\beta$ of the Gaussian noise model.

Q-Learning learns the function $\mathcal{Q}$ which maps a state $s$ and an action $a$ to the reward $r$ which is the long-term discounted reward expected for taking action $a$ in state $s$.

Long-term discounted' means that it is the expected value of $$\sum^I_{i=0} \gamma^{n_i} r_i,$$ where $r_i$ and $n_i$ are rewards and steps to states in which the rewards are received when always taking the most promising action in each step, and $\gamma\leq 1$ is the discount factor.

Q-Learning assumes a world in which one state $s$ can be reached from another stochastically by taking an action $a$. In that world, taking certain actions in certain states stochastically incurs a reward $r$.

Q-learning starts with a random function $\mathcal{Q}$ and repeatedly takes actions and then updates $\mathcal{Q}$ with the observed reward. Actions are taken stochastically. The preference given to actions promising a high reward (according to the current state of $\mathcal{Q}$) is equivalent to the preference of exploitation over exploration. Another parameter of Q-learning is the learning rate which determines how strongly each observed reward changes the $\mathcal{Q}$ function in the next step.

Q-learning is guaranteed to converge to an optimal policy $V^*$ (under certain conditions).

The function $\mathcal{Q}$ induces a strategy $V$ which always takes the action $a$ with the highest expected reward.

In the study due to Xu et al., multi-sensory enhancement in specially-raised cats decreased gradually with distance between uni-sensory stimuli instead of occurring if and only if stimuli were present in their RFs. This is different from cats that are raised normally in which enhancement occurs regardless of stimulus distance if both uni-sensory components both are within their RF.

SOMs learn latent-variable models.

Keogh and Lin claim that what they call time series subsequence clustering is meaningless. Specifically, this means that the clusters found by clustering all subsequences of a time series of a certain length will yield the same (kind of) clusters as clustering random sequences.

Intuitively, clustering time series subsequences is meaningless for two reasons:

• The sum (or average) of all subsequences of a time series is always a straight line, and the cluster centers (in $k$-means and related algorithms) sum to the global mean. Thus, only if the interesting features of a time series together average to a straight line can an algorithm find them. This, however, is not the case very often.
• It should be expected from any meaningful clustering that subsequences which start at near-by time points end up in the same cluster (what Keogh and Lin call trivial matches'). However, the similarity between close-by subsequences depends highly on the rate of change around a subsequence and thus a clustering algorithm will find cluster centers close to subsequences with a low rate of change rather than with a high rate of change and therefore typically for the less interesting subsequences.

Keogh and Lin propose focusing on finding motifs in time series instead of clusters.

Early work on layered architectures pre-wired all but the top-most layer and learned that.

Hinton states that in using SVMs, the actual features (of an image, article...) are extracted by some hand-crafted algorithm and only discriminating objects based on these features is learned.

He sees this as a modern version of what he calls a strategy of denial to learning feature extractors and hidden units.

Unsupervised learning extracts regularities in the input. Detected regularities can then be used for actual discrimination. Or unsupervised learning can be used again to detect regularities in these regularities.

Backpropagation was discovered at least four times within one decade.

If we know which kind of output we want to have and if each neuron's output is a smooth function of its input, then the change in weights to get the right output from the input can be computed using calculus.

Following this strategy, we get backpropagation

If we want to learn classification using backprop, we cannot force our network to create binary output because binary output is not a smooth function of the input.

Instead we can let our network learn to output the log probability for each class given the input.

One problem with backpropagation is that one usually starts with small weights which will be far away from optimal weights. Due to the size of the combinatorial space of weights, learning can therefore take a long time.

Backprop needs a lot of labeled data to learn classification with many classes.

It is unclear how neurons could back-propagate errors in their inputs. Thus, the biological validity of backpropagation is limited

Hinton argues that backpropagation is such a good idea that nature must have found a way to implement it somehow.

In the wake-sleep algorithm, (at least) two layers of neurons are fully connected to each other.

In the wake phase, the lower level drives the upper layer through the bottom-up recognition weights. The top-down generative weights are trained such that they will generate the current activity in the lower level given the current activity in the output level.

In the sleep phase, the upper layer drives activity in the lower layer through the generative weights and the recognition weights are learned such that they induce the activity in the upper layer given the activity in the lower layer.

The restricted Boltzman machine is an unsupervised learning algorithm which is similar to the wake-sleep algorithm. It uses stochastic learning, ie. neural activations are stochastic with continuous probabilities given by weights.

The weights in a trained RBM implicitly encode a PDF over the training set.

Learning in RBMs is competitive but without explicit inhibition (because the RBM is restricted in that it does not have recurrent connections). Neurons learn different things due to random initialization and stochastic processing.

ANN implementing DBN have been around for a long time (they go back at least to Fukushima's Neocognitron).

Hinton proposes building deep belief networks by stacking RBMs and training them unsupervised and in ascending order. After that, the network goes into feed-forward mode and backprop can be used to learn the actual task. Thus, some of the problems of backprop are solved by initializing the weights via unsupervised learning.

A SOM in which each unit computes the probability of some value of a sensory variable produces a probabilistic population code, ie. it computes a population-coded probability density function.

Without an intact association cortex (or LIP), SC neurons cannot develop or maintain cross-modal integration.

(Neither multi-sensory enhancement nor depression.)

There is no depression in the immature SC.

Yu and Dayan argue that uncertainty should suppress top-down, context-dependent factors in inference, and strengthen learning about the situation.

Yu and Dayan propose a model of inference and learning in which expected uncertainty is encoded by high acetylcholine (ACh) levels and unexpected uncertainty is encoded by norapinephrine (NE).

SOMs treat all their input dimensions as observables of some latent variable. It is possible to give data points a dimension containing labels. These labels will not have a greater effect on learning than the other dimensions of the data point. This is especially true if the true labels are not good predictors of the actual latent variable.

Audio-visual map registration has its limits: strong distortions of natural perception can only partially be compensated through adaptation.

Register between sensory maps is necessary for proper integration of multi-sensory stimuli.

Visual localization has much greater precision and reliability than auditory localization. This seems to be one reason for vision guiding hearing (in this particular context) and not the other way around.

It is unclear and disputed whether visual dominance in adaptation is hard-wired or a result of the quality of respective stimuli.

Multisensory integration is present in neonates to some degree depending on species (more in precocial than in altricial species), but it is subject to postnatal development and then influenced by experience.

SOM (along with other competitive learning algorithms, backprop, simulated annealing, neocognitron, svm, and Bayesian models) suffers from catastrophic forgetting when forced to learn too quickly.

ART does not suffer from catastrophic forgetting.

Grossberg states that ART predicts a functional link between consciousness, learning, expectation, attention, resonance, and synchrony and calls this principle the CLEARS principle.

Cuppini et al. expand on their earlier work in modeling cortico-tectal multi-sensory integration.

They present a model which shows how receptive fields and multi-sensory integration can arise through experience.

Optimizing (ie. training) an estimator with input data will result in different results depending on the distribution of data points: wherever there is a high density of data points, the optimizer will reduce the error there, possibly incurring greater error where the density of data points is lower.

Fully supervised learning algorithms are biologically implausible.

In some instances, developing animals lose perceptual capabilities instead of gaining them due to what is called perceptual narrowing or canalization. One example are human neonates who are able to discriminate human and monkey faces at first, but only human faces later in development.

The SOM can be modified to take into account the variance of the input dimensions wrt. each other.

Zhou et al. use an approach similar to that of Bauer et al. They do not use pairwise cross-correlation between input modalities, but simply variances of individual modalities. It is unclear how they handle the case where one modality essentially becomes ground truth to the algorithm.

Without feedback about ground truth, a system learning from a data set of noisy corresponding values must have at least three modalities to learn their reliabilities. One way of doing this is learning pairwise correlation between modalities. It is not enough to take the best hypothesis on the basis of the currently learned reliability model and use that instead of ground truth to learn the variance of the individual modalities: If the algorithm comes to believe that one modality has near-perfect reliability, then that will determine the next best hypotheses. In effect, that modality will be ground truth for the algorithm and it will only learn how well the others predict it.

Zhou et al.'s and Bauer et al.'s statistical SOM variants assume Gaussian noise in the input.

Deneve describes how neurons performing Bayesian inference on variables behind Poisson inputs can learn the parameters of the Poisson processes in an online variant of the expectation maximization (EM) algorithm.

Deneve associates her EM-based learning rule in Bayesian spiking neurons with spike-time dependent plasticity (stdp)

A principal manifold can only be learned correctly using a SOM if

• the SOM's dimensionality is the same as that of the principal manifold
• the noise does not 'smear' the manifold too much, thus making it indistinguishable from a manifold with higher dimensionality.
• there are enough data points to infer the manifold behind the noise.

The SOM has ancestors in von der Malsburg's "Self-Organization of Orientation Sensitive Cells in the Striate Cortex" and other early models of self-organization

The SOM is an abstraction of biologically-plausible ANN.

The SOM is an asymptotically optimal vector quantizer.

There is no cost function that the SOM algorithm follows exactly.

Quality of order in SOMs is a difficult issue because there is no unique definition of order' in for the $n$-dimensional case if $n>2$.

Nevertheless, there have been a number of attempts.

There have been many extensions of the original SOM ANN, like

• (Growing) Neural Gas
• Parameterized SOM (PSOM)
• Stochastic SOM
• recursive and recurrent SOMs

Recursive and Recurrent SOMs have been used for mapping temporal data.

SOMs tend to have greater unit densities for points in data space with high data density. They do not follow the density strictly, however.

Hebbian learning and in particular SOM-like algorithms have been used to model cross-sensory spatial register (eg. in the SC).

Bauer et al. present a SOM variant which learns the variance of different sensory modalities (assuming Gaussian noise) to model multi-sensory integration in the SC.

Bauer and Wermter present an ANN algorithm which takes from the self-organizing map (SOM) algorithm the ability to learn a latent variable model from its input. They extend the SOM algorithm so it learns about the distribution of noise in the input and computes probability density functions over the latent variables. The algorithm represents these probability density functions using population codes. This is done with very few assumptions about the distribution of noise.

Bauer and Wermter use the algorithm they proposed to model multi-sensory integration in the SC. They show that it can learn to near-optimally integrate noisy multi-sensory information and reproduces spatial register of sensory maps, the spatial principle, the principle of inverse effectiveness, and near-optimal audio-visual integration in object localization.