Show Reference: "Multilayer Feedforward Networks are Universal Approximators"

Multilayer Feedforward Networks are Universal Approximators Neural Networks, Vol. 2, No. 5. (January 1989), pp. 359-366, doi:10.1016/0893-6080(89)90020-8 by Kurt Hornik, Maxwell Stinchcombe, Halbert White
    abstract = {This paper rigorously establishes that standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. In this sense, multilayer feedforward networks are a class of universal approximators.},
    author = {Hornik, Kurt and Stinchcombe, Maxwell and White, Halbert},
    citeulike-article-id = {1700245},
    citeulike-linkout-0 = {},
    citeulike-linkout-1 = {},
    citeulike-linkout-2 = {},
    doi = {10.1016/0893-6080(89)90020-8},
    issn = {08936080},
    journal = {Neural Networks},
    keywords = {ann, feedforward, function-approximation, mlp},
    month = jan,
    number = {5},
    pages = {359--366},
    posted-at = {2014-08-20 14:53:32},
    priority = {2},
    title = {Multilayer Feedforward Networks are Universal Approximators},
    url = {},
    volume = {2},
    year = {1989}

See the CiteULike entry for more info, PDF links, BibTex etc.

Single layer perceptrons cannot approximate every continuous function.

Multilayer perceptrons can approximate any continuous function with only a single hidden layer.

It was known before Hornik et al.'s work, that specific classes of multilayer feedforward networks could approximate any continuous function.

Hornik et al. showed that multilayer feed-forward networks with arbitrary squashing functions can approximate any continuous function with only a single hidden layer with any desired accuracy (on a compact set of input patterns).

If an MLP fails to approximate a certain function, this can be due to

  • inadequate learning procedure,
  • inadequate number of hidden units (not layers),
  • noise.

In principle, a three-layer feedforward network should be capable of approximating any (continuous) function.

Hornik et al. define a squashing function as a non-decreasing function $\Psi:\mathbb{R}\rightarrow [0,1]$ whose limit at infinity is 1 and whose limit at negative infinity is 0.