Show Tag: som

Select Other Tags

Through lateral connections, a Hebbian learning rule, and approximate initialization, Cuppini et al. manage to learn register between sensory maps. This can be seen as an implementation of a SOM.

Cuppini et al. do not evaluate their model's performance (comparability to cat/human performance, optimality...)

Ravulakollu et al. argue against SOMs and for radial basis functions (RBF) for combining stimuli (for reasons I don't quite understand).

SOMs can be used as a means of learning principal manifolds.

SOM-based algorithms have been used to model several features of natural visual processing.

Miikulainen et al. use their SOM-based algorithms to model the visual cortex.

Miikulainen et al. use a hierarchical version of their SOM-based algorithm to model natural development of visual capabilities.

One version of DAC uses SOMs.

Self-organization occurs in the physical world as well as in information-processing systems. In neural-network-like systems, SOMs are not the only way of self-organization.

There are border effects in SOM learning: the distribution of neurons in data space is not the same in the center and the periphery of the network/data space.

One solution to border effects are SOMs with cyclic/spherical/hyper spherical/toroid topologies.

One way of evening out distribution of SOM units in data space is using `conscience': a value which increases every time a neuron is BMU and decreases whenever it isn't. High conscience values then lead to a lower likelihood of being selected as BMU.

SOMs fail to learn what's interesting if what's not interesting (the noise) better explains the data.

SOMs have been used to model biology.

Adams et al. use SOM-like algorithms to model biological sensori-motor control and develop robotic sensori-motor controllers.

Adams et al. state that others have used SOM-like algorithms for modelling biology and for robotic applications, before (and list examples).

A SOM that is to learn continuously cannot continuously decrease neighborhood interaction width and learning rate.

It is helpful if these parameters are self-regulated, like in PSOM.

Adams et al. present a Spiking Neural Network implementation of a SOM which uses

  • spike-time dependent plasticity
  • a method to adapt the learning rate
  • constant neighborhood interaction width

There have been a number of attempts at spiking SOM implementations.

Adams et al. note that there have been a number of attempts at spiking SOM implementations (and list a few).

SOMs and SOM-like algorithms have been used to model natural multi-sensory integration in the SC.

Anastasio and Patton model the deep SC using SOM learning.

Anastasio and Patton present a model of multi-sensory integration in the superior colliculus which takes into account modulation by uni-sensory projections from cortical areas.

In the model due to Anastasio and Patton, deep SC neurons combine cortical input multiplicatively with primary input.

Anastasio and Patton's model is trained in two steps:

First, connections from primary input to deep SC neurons are adapted in a SOM-like fashion.

Then, connections from uni-sensory, parietal inputs are trained, following an anti-Hebbian regime.

The latter phase ensures the principles of modality-matching and cross-modality.

SOM learning produces clusters of neurons with similar modality responsiveness in the SC model due to Anastasio and Patton.

The model due to Anastasio and Patton reproduces multi-sensory enhancement.

Deactivating modulatory, cortical input also deactivates multi-sensory enhancement.

In Anastasio and Patton's SC model, the spatial organization of the SOM is not used to represent the spatial organization of the outside world, but to distribute different sensitivities to the input modalities in different neurons.

It's a bit strange that Anastasio and Patton's and Martin et al.'s SC models do not use the spatial organization of the SOM to represent the spatial organization of the outside world, but to distribute different sensitivities to the input modalities in different neurons.

KNN (or sparse coding) seems to be more appropriate for that.

Anastasio drop the strong probabilistic interpretation of SC neurons' firing patterns in their learning model.

The SC model presented by Cuppini et al. has a circular topology to prevent the border effect.

Dávila-Chacón evaluated SOMs as a clustering layer on top of the MSO and LSO modules of the Liu et al. sound source localization system. On top of the clustering layer, they tried out a number of neural and statistical classification layers.

The result was inferior by a margin to the best methods they found.

In SOM learning, shrinking of the neighborhood size and decreasing update strength usually follow predefined schedules i.e. they only depend on the update step.

In the PLSOM algorithm, update strength depends on the difference between a data point and the best-matching unit's weight vector, the quantization error. A large distance, indicating a bad representation of that data point in the SOM, leads to a stronger update than a small distance. The distance is scaled relative to the largest quantization error encountered so far.

PLSOM reduces the number of parameters of the SOM algorithm from four to two.

PLSOM overreacts to outliers: data points which are very unrepresentative of the data in general will change the network more strongly than they should.

PLSOM2 addresses the problem of PLSOM overreacting to outliers.

Chen et al. presented a system which uses a SOM to cluster states. After learning, the SOM units are extended with a histogram keeping the number of times the unit was BMU and the input belonged to each of a number of known states $$C={c_1,c_2,\dots,c_n}$$.

The system is used in robot soccer. Each class is connected to an action. Actions are chosen by finding the BMU in the net and selecting the action connected to its most likely class.

In an unsupervised, online phase, these histograms are updated in a reinforcement-learning fashion: whenever the action selected lead to success, the bin in the BMU's histogram which was the most likely class is increased. It is decreased otherwise.

Since much of what the visual system does can be seen as compression, since SOMs can do vector quantization (VQ) and since VQ is a compression technique, it makes sense that SOMs have been useful in modeling visual processing.

Using a space-coded approach instead of an MLP for learning multi-sensory integration has benefits:

  • learning is unsupervised
  • can work with missing data

Is it possible to learn the reliability of its sensory modalities from how well they agree with the consensus between the modalities under certain conditions?

Possible conditions:

  • many modalities (what my 2013 model does)
  • similar reliability
  • enough noise
  • enough remaining entropy at the end of learning (worked in early versions of my SOM)

Kohonen cites von der Malsburg and Amari as among the first to demonstrate input-driven self-organization in machine learning.

Kohonen implies that neighborhood interaction in SOMs is an abstraction of chemical interactions between neurons in natural brain maps, which affect those neurons' plasticity, but not their current response.

Kohonen implies that neighborhood interaction in SOMs is what separates them from earlier, more bio-inspired attempts at input-driven self-organization, and what leads to computational tractability on the one hand and proper self-organization as found in natural brain maps on the other.

Kohonen depicts SOMs as an extension of vector quantization (VQ).

Kohonen states that online learning in SOMs is less safe and slower than batch learning.

Kohonen names normalization of input dimensions as a remedy for differences in scaling between these dimensions. He does not cite another paper of his (with colleagues) in which he presents a SOM that learns this scaling.

My SOM takes care of differences in scaling between input dimensions implicitly and weights input dimensions while Kangas et al.'s SOM only learns scaling.

Kohonen discusses some of the challenges involved in using SOMs for text clustering.

  • words have different importance depending on their absolute frequency,
  • some words occurring very rarely or very commonly must be discarded.

It would be really interesting to see whether SOMs for text clustering can be designed whose weight vectors code for different sets of words.

Kohonen states that early SOMs were meant to model brain maps and how they come to be.

So, I've returned to the roots and found something interesting for applications!

Kohonen says the main virtue of SOMs lies in data visualization

Kohonen groups applications of SOMs into

  • statistical methods
    • exploratory data analysis
    • statistical analysis in organization of texts
  • industrial analyzes, control, telecommunications
  • financial applications

Kohonen advises to initialize large SOMs such that their initial organization is already similar to the expected final one to speed up convergence.


Kohonen proposes a version of the SOM in which models (SOM units' weight vectors) are combined linearly from the n best matching units for a given input to optimally describe that input.

The weights for the linear combination are derived of the models' distances from the input.


If a SOM is trained on data whose dimensionality is higher than that of the SOM's grid, mapping of data points into the grid becomes more and more non-monotonic, an effect called `zebra stripes' by Kohonen.

Martin et al. model multisensory integration in the SC using a SOM algorithm.

Input in Martin et al.'s model of multisensory integration in the SC is an $m$-dimensional vector for every data point, where $m$ is the number of modalities. Data points are uni-modal, bi-modal, or tri-modal. Each dimension of the data point codes stochastically for the combination of modalities of the data point. The SOM learns to map different modality combinations to different regions into its two-dimensional grid.

Input in Martin et al.'s model of multisensory integration in the SC replicates enhancement and, through the non-linear transfer function, superadditivity.

Bishop et al.'s goal in introducing generative topographic mapping was not biological plausibility.

Learning rate and neighborhood width schedule have to be chosen arbitrarily for (vanilla) SOM training.

There is no general proof for convergence for SOMs.

There is no cost function that SOM learning follows.

A SOM population code in response to some input vector is not a probability density function.

The SOM algorithm works, but it has some serios theoretical problems.

GTM was developed in part to be a good alternative for the SOM algorithm.

In implementing GTM for some specific use case, one chooses a noise model (at least one per data dimension).

SOMs learn latent-variable models.

If the noise in the inputs to my SOM isn't uncorrelated between input neurons, then the SOM cannot properly learn a latent variable model.

There can be situations where my algorithm is still optimal or near-optimal.

Keogh and Lin claim that what they call time series subsequence clustering is meaningless. Specifically, this means that the clusters found by clustering all subsequences of a time series of a certain length will yield the same (kind of) clusters as clustering random sequences.

Intuitively, clustering time series subsequences is meaningless for two reasons:

  • The sum (or average) of all subsequences of a time series is always a straight line, and the cluster centers (in $k$-means and related algorithms) sum to the global mean. Thus, only if the interesting features of a time series together average to a straight line can an algorithm find them. This, however, is not the case very often.
  • It should be expected from any meaningful clustering that subsequences which start at near-by time points end up in the same cluster (what Keogh and Lin call `trivial matches'). However, the similarity between close-by subsequences depends highly on the rate of change around a subsequence and thus a clustering algorithm will find cluster centers close to subsequences with a low rate of change rather than with a high rate of change and therefore typically for the less interesting subsequences.

Keogh and Lin propose focusing on finding motifs in time series instead of clusters.

A SOM in which each unit computes the probability of some value of a sensory variable produces a probabilistic population code, ie. it computes a population-coded probability density function.

According to Zhang et al., ViSOM (and other unsupervised methods) do not take data labels and their intrinsic structure into account if they are present.

SOMs treat all their input dimensions as observables of some latent variable. It is possible to give data points a dimension containing labels. These labels will not have a greater effect on learning than the other dimensions of the data point. This is especially true if the true labels are not good predictors of the actual latent variable.

My SOMs learn competitively. But they actually don't encode error but latent variables.

If what Friston means by `models that do not show conditional independence' includes SOM, then that would explain why I can't find an error signal. Maybe the prior constraint invoked by SOMs is similarity between stimuli?

Possibly, this is a point for future work: model cortico-collicular connections as prediction. But, in Friston's framework, there would have to be ascending connections, too.

SOM (along with other competitive learning algorithms, backprop, simulated annealing, neocognitron, svm, and Bayesian models) suffers from catastrophic forgetting when forced to learn too quickly.

The SOM can be modified to take into account the variance of the input dimensions wrt. each other.

Zhou et al. use an approach similar to that of Bauer et al. They do not use pairwise cross-correlation between input modalities, but simply variances of individual modalities. It is unclear how they handle the case where one modality essentially becomes ground truth to the algorithm.

Without feedback about ground truth, a system learning from a data set of noisy corresponding values must have at least three modalities to learn their reliabilities. One way of doing this is learning pairwise correlation between modalities. It is not enough to take the best hypothesis on the basis of the currently learned reliability model and use that instead of ground truth to learn the variance of the individual modalities: If the algorithm comes to believe that one modality has near-perfect reliability, then that will determine the next best hypotheses. In effect, that modality will be ground truth for the algorithm and it will only learn how well the others predict it.

A SOM models a population and each unit has a response to a stimulus; it is therefore possible to read out a population code from a SOM. This population code is not very meaningful in the standard SOM. Given a more statistically motivated distance function, the population code can be made more meaningful.

Zhou et al.'s and Bauer et al.'s statistical SOM variants assume Gaussian noise in the input.

A principal manifold can only be learned correctly using a SOM if

  • the SOM's dimensionality is the same as that of the principal manifold
  • the noise does not 'smear' the manifold too much, thus making it indistinguishable from a manifold with higher dimensionality.
  • there are enough data points to infer the manifold behind the noise.

The SOM has ancestors in von der Malsburg's "Self-Organization of Orientation Sensitive Cells in the Striate Cortex" and other early models of self-organization

The SOM is an abstraction of biologically-plausible ANN.

The SOM is an asymptotically optimal vector quantizer.

There is no cost function that the SOM algorithm follows exactly.

Quality of order in SOMs is a difficult issue because there is no unique definition of `order' in for the $n$-dimensional case if $n>2$.

Nevertheless, there have been a number of attempts.

There have been many extensions of the original SOM ANN, like

  • (Growing) Neural Gas
  • adaptive subspace SOM (ASSOM)
  • Parameterized SOM (PSOM)
  • Stochastic SOM
  • recursive and recurrent SOMs

Recursive and Recurrent SOMs have been used for mapping temporal data.

SOMs tend to have greater unit densities for points in data space with high data density. They do not follow the density strictly, however.

Hebbian learning and in particular SOM-like algorithms have been used to model cross-sensory spatial register (eg. in the SC).

Bauer et al. present a SOM variant which learns the variance of different sensory modalities (assuming Gaussian noise) to model multi-sensory integration in the SC.

Bauer and Wermter present an ANN algorithm which takes from the self-organizing map (SOM) algorithm the ability to learn a latent variable model from its input. They extend the SOM algorithm so it learns about the distribution of noise in the input and computes probability density functions over the latent variables. The algorithm represents these probability density functions using population codes. This is done with very few assumptions about the distribution of noise.