Show Reference: "To Recognize Shapes, First Learn to Generate Images"

To Recognize Shapes, First Learn to Generate Images Progress in Brain Research, Vol. 165 (2007), pp. 535-547, doi:10.1016/s0079-6123(06)65034-6 by Geoffrey E. Hinton
    abstract = {The uniformity of the cortical architecture and the ability of functions to move to different areas of cortex following early damage strongly suggest that there is a single basic learning algorithm for extracting underlying structure from richly structured, high-dimensional sensory data. There have been many attempts to design such an algorithm, but until recently they all suffered from serious computational weaknesses. This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.},
    author = {Hinton, Geoffrey E.},
    doi = {10.1016/s0079-6123(06)65034-6},
    issn = {0079-6123},
    journal = {Progress in Brain Research},
    keywords = {deep-learning},
    pages = {535--547},
    pmid = {17925269},
    posted-at = {2013-08-15 01:58:43},
    priority = {2},
    title = {To Recognize Shapes, First Learn to Generate Images},
    url = {},
    volume = {165},
    year = {2007}

See the CiteULike entry for more info, PDF links, BibTex etc.

Selfridge's Pandemonium is (at least one) progenitor of all hierarchical cognitive architectures. It comprises a hierarchy of layers in which each layer detects patterns in the activity of its more primitive preceding layer.

Early work on layered architectures pre-wired all but the top-most layer and learned that.

Hinton states that in using SVMs, the actual features (of an image, article...) are extracted by some hand-crafted algorithm and only discriminating objects based on these features is learned.

He sees this as a modern version of what he calls a strategy of denial to learning feature extractors and hidden units.

Unsupervised learning extracts regularities in the input. Detected regularities can then be used for actual discrimination. Or unsupervised learning can be used again to detect regularities in these regularities.

Backpropagation was discovered at least four times within one decade.

If we know which kind of output we want to have and if each neuron's output is a smooth function of its input, then the change in weights to get the right output from the input can be computed using calculus.

Following this strategy, we get backpropagation

If we want to learn classification using backprop, we cannot force our network to create binary output because binary output is not a smooth function of the input.

Instead we can let our network learn to output the log probability for each class given the input.

One problem with backpropagation is that one usually starts with small weights which will be far away from optimal weights. Due to the size of the combinatorial space of weights, learning can therefore take a long time.

Backprop needs a lot of labeled data to learn classification with many classes.

It is unclear how neurons could back-propagate errors in their inputs. Thus, the biological validity of backpropagation is limited

Hinton argues that backpropagation is such a good idea that nature must have found a way to implement it somehow.

In the wake-sleep algorithm, (at least) two layers of neurons are fully connected to each other.

In the wake phase, the lower level drives the upper layer through the bottom-up recognition weights. The top-down generative weights are trained such that they will generate the current activity in the lower level given the current activity in the output level.

In the sleep phase, the upper layer drives activity in the lower layer through the generative weights and the recognition weights are learned such that they induce the activity in the upper layer given the activity in the lower layer.

The restricted Boltzman machine is an unsupervised learning algorithm which is similar to the wake-sleep algorithm. It uses stochastic learning, ie. neural activations are stochastic with continuous probabilities given by weights.

The weights in a trained RBM implicitly encode a PDF over the training set.

Hinton proposes building deep belief networks by stacking RBMs and training them unsupervised and in ascending order. After that, the network goes into feed-forward mode and backprop can be used to learn the actual task. Thus, some of the problems of backprop are solved by initializing the weights via unsupervised learning.