Show Reference: "GTM: The Generative Topographic Mapping"

GTM: The Generative Topographic Mapping Neural Computation, Vol. 10, No. 1. (1998), pp. 215-234 by Christopher M. Bishop, Markus Svensen, Christopher K. I. Williams
@article{bishop-et-al-1998,
    abstract = {Latent variable models represent the probability density of data in a space of several dimensions
in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis
which is based on a linear transformations between the latent space and the data space. In this
paper we introduce a form of non-linear latent variable model called the Generative Topographic
Mapping for which the parameters of the model can be determined using the {EM} algorithm. {GTM}
provides a...},
    author = {Bishop, Christopher M. and Svensén, Markus and Williams, Christopher K. I.},
    citeulike-article-id = {362263},
    citeulike-linkout-0 = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9751},
    journal = {Neural Computation},
    keywords = {learning, probabilities, som, unsupervised-learning},
    number = {1},
    pages = {215--234},
    posted-at = {2014-06-10 14:45:00},
    priority = {2},
    title = {{GTM}: The Generative Topographic Mapping},
    url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.9751},
    volume = {10},
    year = {1998}
}

See the CiteULike entry for more info, PDF links, BibTex etc.

Bishop et al.'s goal in introducing generative topographic mapping was not biological plausibility.

Generative Topographic Mapping produces PDFs for latent variables given data points.

Learning rate and neighborhood width schedule have to be chosen arbitrarily for (vanilla) SOM training.

There is no general proof for convergence for SOMs.

There is no cost function that SOM learning follows.

A SOM population code in response to some input vector is not a probability density function.

The SOM algorithm works, but it has some serios theoretical problems.

GTM was developed in part to be a good alternative for the SOM algorithm.

In implementing GTM for some specific use case, one chooses a noise model (at least one per data dimension).

GTM, at least in its original formulation, is a batch algorithm.

GTM uses the EM algorithm to fit adaptive parameters $\mathbf{W}$ and $\beta$ of a constrained mixture of Gaussian model to the data.

The constrained mixture of Gaussian model consists of a set $\{\mathbf{x}_i\}$ of points in latent space which are mapped via a general linear model $\mathbf{W}\phi(x)$ into data space, and the inverse variance $\beta$ of the Gaussian noise model.