# Show Reference: "Optimal Estimation in Sensory Systems"

@incollection{simoncelli-2009,
title = {Optimal Estimation in Sensory Systems},
author = {Simoncelli, Eero P.},
crossref = book-gazzaniga-2009
}

@book{book-gazzaniga-2009,
day = {18},
edition = {4},
editor = {Gazzaniga, Michael S.},
howpublished = {Hardcover},
isbn = {9780262013413},
keywords = {cognitive-neuroscience, neuroscience},
month = sep,
posted-at = {2013-04-03 08:44:07},
priority = {2},
publisher = {MIT Press},
title = {The Cognitive Neurosciences},
year = {2009}
}

Empirical Bayes methods estimate the prior from the data.

More formally, they choose some parametric form for the prior, and estimate an optimal set of parameters $\theta_{opt}$ by optimizaton: $$\theta_{opt} = \mathrm{arg\;max}_\theta\prod_n\int P_\theta(x)P(m_n\mid x)\;dx,$$ for measurements $m_n$ and possible latent variable values $x$.

An estimator is a deterministic function which maps from measurements to estimates.

Statistical decision theory and Bayesian estimation are used in the cognitive sciences to describe performance in natural perception.

A best estimator wrt. some loss function is an estimator that minimizes the average value of that loss function.

Optimizing (ie. training) an estimator with input data will result in different results depending on the distribution of data points: wherever there is a high density of data points, the optimizer will reduce the error there, possibly incurring greater error where the density of data points is lower.

Fully supervised learning algorithms are biologically implausible.

Given probability density functions (PDF) $P(X)$ and $P(X\mid M)$ for a latent variable $X$ and an observable $M$, an optimal estimator for $X$ wrt. the loss function $F$ is given by $$f_{opt} = \mathrm{arg min}_f \int P(x) \int P(x\mid m) L(x,f(m))\;dx\;dm$$

The maxiumum a posteriori estimator (MAP) arises from an error function which penalizes all errors equally.

A weakness of empirical Bayes is that the prior which explains the data best is "not necessarily the one that leads to the best estimator".

Already von Helmholtz formulated the idea that prior knowledge---or expectations---are fused with sensory information into perception.

This idea is at the core of Bayesian theory.

Optimality of an estimator is relative to

• loss function,
• measurement probability,
• prior,
• (depending on the setting) a family of functions.

We do not know the types of functions computable by neurons.

A representation of probabilities is not necessary for optimal estimation.

A neural population may encode a probability density function if each neuron's response represents the probability (or log probability) of some concrete value of a latent variable.

Early visual neurons (eg. in V1) do not seem to encode probabilities.

The later an estimate is made explicit from a (probabilistic) neural population code, the less information is lost in the conversion.