GTM: The Generative Topographic Mapping *Neural Computation*, Vol. 10, No. 1. (1998), pp. 215-234 by Christopher M. Bishop, Markus Svensen, Christopher K. I. Williams

@article{bishop-et-al-1998, abstract = {Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of non-linear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the {EM} algorithm. {GTM} provides a...}, author = {Bishop, Christopher M. and Svensén, Markus and Williams, Christopher K. I.}, citeulike-article-id = {362263}, citeulike-linkout-0 = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9751}, journal = {Neural Computation}, keywords = {learning, probabilities, som, unsupervised-learning}, number = {1}, pages = {215--234}, posted-at = {2014-06-10 14:45:00}, priority = {2}, title = {{GTM}: The Generative Topographic Mapping}, url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9751}, volume = {10}, year = {1998} }

See the CiteULike entry for more info, PDF links, BibTex etc.

Bishop et al.'s goal in introducing generative topographic mapping was not biological plausibility.⇒

Generative Topographic Mapping produces PDFs for latent variables given data points.⇒

Learning rate and neighborhood width schedule have to be chosen arbitrarily for (vanilla) SOM training.⇒

There is no general proof for convergence for SOMs.⇒

There is no cost function that SOM learning follows.⇒

A SOM population code in response to some input vector is not a probability density function.⇒

The SOM algorithm works, but it has some serios theoretical problems.⇒

GTM was developed in part to be a good alternative for the SOM algorithm.⇒

In implementing GTM for some specific use case, one chooses a noise model (at least one per data dimension).⇒

GTM uses the EM algorithm to fit adaptive parameters $\mathbf{W}$ and $\beta$ of a constrained mixture of Gaussian model to the data.

The constrained mixture of Gaussian model consists of a set $\{\mathbf{x}_i\}$ of points in latent space which are mapped via a general linear model $\mathbf{W}\phi(x)$ into data space, and the inverse variance $\beta$ of the Gaussian noise model.⇒