Show Tag: unsupervised-learning

Select Other Tags

The model of the SC due to Cuppini et al. reproduces development of

  1. multi-sensory neurons
  2. multi-sensory enhancement
  3. intra-modality depression
  4. super-additivity
  5. inverse effectiveness

Through lateral connections, a Hebbian learning rule, and approximate initialization, Cuppini et al. manage to learn register between sensory maps. This can be seen as an implementation of a SOM.

k-means is a special case of the EM algorithm

Among the advantages of unsupervised learning is that it does not require labeled data, which means that there is usually more data available for learning.

Regular Hebbian learning leads to all neurons responding to the same input. One method to force neurons to specialize is competitive learning.

Competitive learning can be implemented in ANN by strong, constant inhibitory connections between competing neurons.

Simple competitive neural learning with constant inhibitory connections between competing neurons leads to grandmother-type cells.

Simple competitive neural learning with constant inhibitory connections between competing neurons produces a code that facilitates further processing.

A network with Hebbian and anti-Hebbian learning can produce a sparse code. Excitatory connections from input to output are learned Hebbian while inhibition between output neurons are learned anti-Hebbian.

Bergan et al. show that interaction with the environment can drive multisensory learning. However, Xu et al. show that multisensory learning can also happen if there is no interaction with the multisensory world.

Hinoshita et al. define self-organization as the phenomenon of a global, coherent structure arising in a system through local interaction between its elements as opposed to through some sort of central control.

According to Hinoshita et al., recurrent neural networks are capable of self-organization.

The SC model presented by Cuppini et al. has a circular topology to prevent the border effect.

Weber presents a Helmholtz machine extended by adaptive lateral connections between units and a topological interpretation of the network. A Gaussian prior over the population response (a prior favoring co-activation of close-by units) and training with natural images lead to spatial self-organization and feature-selectivity similar to that in cells in early visual cortex.

Using a space-coded approach instead of an MLP for learning multi-sensory integration has benefits:

  • learning is unsupervised
  • can work with missing data

The fact that multi-sensory integration arises without reward connected to stimuli motivates unsupervised learning approaches to SC modeling.

The precise characteristics of multi-sensory integration were shown to be sensitive to their characteristics in the experienced real world during early life.

Mixing Hebbian (unsupervised) learning with feedback can guide the unsupervised learning process in learning interesting, or task-relevant things.

Is it possible to learn the reliability of its sensory modalities from how well they agree with the consensus between the modalities under certain conditions?

Possible conditions:

  • many modalities (what my 2013 model does)
  • similar reliability
  • enough noise
  • enough remaining entropy at the end of learning (worked in early versions of my SOM)

Classical models assume that learning in cortical regions is well described in an unsupervised learning framework while learning in the basal ganglia can be modeled by reinforcement learning.

Representations in the cortex (eg. V1) develop differently depending on the task. This suggests that some sort of feedback signal might be involved and learning in the cortex is not purely unsupervised.

Some task-dependency in representations may arise from embodied learning where actions bias experiences being learned from.

Conversely, the narrow range of disparities reflected in disparaty-selective cells in visual cortex neurons might be due to goal-directed feature learning.

Unsupervised learning models have been extended with aspects of reinforcement learning.

SOMs can be used for preprocessing in reinforcement learning, simplifying their high-dimensional input via their winner-take-all characteristics.

However, since standard SOMs do not get any goal-dependent input, they focus on globally strongest features (statistically most predictive latent variables) and under-emphasize features which would be relevant for the task.

The model due to Weber and Triesch combines SOM- or K-Means-like learning of features with prediction error feedback as in reinforcement learning. The model is thus able to learn relevant and disregard irrelevant features.

RNNPB learns sequences of inputs unsupervised (self-organized).

Similar parametric bias vectors are learned by the RNNPB for similar input.

Bishop et al.'s goal in introducing generative topographic mapping was not biological plausibility.

Generative Topographic Mapping produces PDFs for latent variables given data points.

Learning rate and neighborhood width schedule have to be chosen arbitrarily for (vanilla) SOM training.

There is no general proof for convergence for SOMs.

There is no cost function that SOM learning follows.

A SOM population code in response to some input vector is not a probability density function.

The SOM algorithm works, but it has some serios theoretical problems.

GTM was developed in part to be a good alternative for the SOM algorithm.

In implementing GTM for some specific use case, one chooses a noise model (at least one per data dimension).

SOMs learn latent-variable models.

Unsupervised learning extracts regularities in the input. Detected regularities can then be used for actual discrimination. Or unsupervised learning can be used again to detect regularities in these regularities.

In the wake-sleep algorithm, (at least) two layers of neurons are fully connected to each other.

In the wake phase, the lower level drives the upper layer through the bottom-up recognition weights. The top-down generative weights are trained such that they will generate the current activity in the lower level given the current activity in the output level.

In the sleep phase, the upper layer drives activity in the lower layer through the generative weights and the recognition weights are learned such that they induce the activity in the upper layer given the activity in the lower layer.

The restricted Boltzman machine is an unsupervised learning algorithm which is similar to the wake-sleep algorithm. It uses stochastic learning, ie. neural activations are stochastic with continuous probabilities given by weights.

The weights in a trained RBM implicitly encode a PDF over the training set.

Learning in RBMs is competitive but without explicit inhibition (because the RBM is restricted in that it does not have recurrent connections). Neurons learn different things due to random initialization and stochastic processing.

Hinton proposes building deep belief networks by stacking RBMs and training them unsupervised and in ascending order. After that, the network goes into feed-forward mode and backprop can be used to learn the actual task. Thus, some of the problems of backprop are solved by initializing the weights via unsupervised learning.

A SOM in which each unit computes the probability of some value of a sensory variable produces a probabilistic population code, ie. it computes a population-coded probability density function.

According to Zhang et al., ViSOM (and other unsupervised methods) do not take data labels and their intrinsic structure into account if they are present.

SOMs treat all their input dimensions as observables of some latent variable. It is possible to give data points a dimension containing labels. These labels will not have a greater effect on learning than the other dimensions of the data point. This is especially true if the true labels are not good predictors of the actual latent variable.

The SOM can be modified to take into account the variance of the input dimensions wrt. each other.

Zhou et al. use an approach similar to that of Bauer et al. They do not use pairwise cross-correlation between input modalities, but simply variances of individual modalities. It is unclear how they handle the case where one modality essentially becomes ground truth to the algorithm.

Without feedback about ground truth, a system learning from a data set of noisy corresponding values must have at least three modalities to learn their reliabilities. One way of doing this is learning pairwise correlation between modalities. It is not enough to take the best hypothesis on the basis of the currently learned reliability model and use that instead of ground truth to learn the variance of the individual modalities: If the algorithm comes to believe that one modality has near-perfect reliability, then that will determine the next best hypotheses. In effect, that modality will be ground truth for the algorithm and it will only learn how well the others predict it.

The SOM has ancestors in von der Malsburg's "Self-Organization of Orientation Sensitive Cells in the Striate Cortex" and other early models of self-organization

The SOM is an abstraction of biologically-plausible ANN.

The SOM is an asymptotically optimal vector quantizer.

There is no cost function that the SOM algorithm follows exactly.

Quality of order in SOMs is a difficult issue because there is no unique definition of `order' in for the $n$-dimensional case if $n>2$.

Nevertheless, there have been a number of attempts.

There have been many extensions of the original SOM ANN, like

  • (Growing) Neural Gas
  • adaptive subspace SOM (ASSOM)
  • Parameterized SOM (PSOM)
  • Stochastic SOM
  • recursive and recurrent SOMs

Recursive and Recurrent SOMs have been used for mapping temporal data.

Von der Malsburg introduces a simple model of self-organization which explains the organization of direction-sensitive cells in the human visual cortex.

Bauer and Wermter present an ANN algorithm which takes from the self-organizing map (SOM) algorithm the ability to learn a latent variable model from its input. They extend the SOM algorithm so it learns about the distribution of noise in the input and computes probability density functions over the latent variables. The algorithm represents these probability density functions using population codes. This is done with very few assumptions about the distribution of noise.