Show Tag: pdf

Select Other Tags

Bayesian information processing does not represent and manipulate unitary variables but PDFs over variables.

According to Knill and Pouget, being an optimal Bayesian observer only means to take into account the uncertainty of the available information (in the system—that's after lossy transformation from physical stimuli to neural representations).

Freeman and Dale discuss three measures for detecting bimodality in an observed probability distribution:

  • The bimodality coefficient (BC),
  • Hartigan's dip statistic (HDS), and
  • Akaike's information criterion between one-component and two-component distribution models (AID).

Measures for detecting bimodality can be used to detect whether psychometric measurements include cases in which behavior was caused by different cognitive processes (like intuitive and rational processing).

According to Freeman and Dale, Hartigan's dip statistic is more robust against skew than either the bimodality coefficent and Akaike's information criterion.

The bimodality coefficient can be unstable with small sample sizes (n<10).

Bimodality measures for probability distributions are affected by

  • distance between modes,
  • proportion (relative gain) of modes, and
  • proportion of skew.

Bimodality measures for probability distributions are affected by

  • distance between modes,
  • proportion (relative gain) of modes, and
  • proportion of skew.

Of the three, Freeman and Dale found distance between modes to have the greatest impact on the measures they chose.

In Freeman and Dale's simulations, Hartigan's dip statistic was the most sensitive in detecting bimodality.

In Freeman and Dale's simulations, Hartigan's dip statistic was strongly influenced by proportion between modes.

In Freeman and Dale's simulations, the bimodality coefficient suffered from interactions between skew and proportion between modes.

According to Freeman and Dale, the bimodality coefficient uses the heuristic that bimodal distributions often are asymmetric which would lead to high skew and low kurtosis.

It therefore makes sense that it may detect false positives for uni-modal distributions with high skew and low kurtosis.

Freeman and Dale `are inclined to recommend' Hartigan's dip statistic to detect bimodality.

Intuitively, Akaike's information criterion between one-component and two-component distribution models (AID) tests whether a one model or another describes the data better, with a penalty for model complexity.

Freeman and Dale found Akaike's information criterion between one-component and two-component distribution models (AID) to be very sensitive to but highly biased towards bimodality.

Pfister et al. recommend using Hartigan's dip statistic and the bimodality coefficient plus visual inspection to detect bimodality.

Generative Topographic Mapping produces PDFs for latent variables given data points.

A SOM population code in response to some input vector is not a probability density function.

The SOM algorithm works, but it has some serios theoretical problems.

A SOM in which each unit computes the probability of some value of a sensory variable produces a probabilistic population code, ie. it computes a population-coded probability density function.

Jazayeri and Movshon present an ANN model for computing likelihood functions ($\approx$ probability density functions with uniform priors) from input population responses with arbitrary tuning functions.

Their assumptions are

  • restricted types of noise characteristics (eg. Poisson noise)
  • statistically independent noise

Since they work with log likelihoods, they can circumvent the problem of requiring neural multiplication.

According to Barber et al., `the original Hopfield net implements Bayesian inference on analogue quantities in terms of PDFs'.

Neural populations can compute and encode probability density functions for external variables.

Bauer and Wermter present an ANN algorithm which takes from the self-organizing map (SOM) algorithm the ability to learn a latent variable model from its input. They extend the SOM algorithm so it learns about the distribution of noise in the input and computes probability density functions over the latent variables. The algorithm represents these probability density functions using population codes. This is done with very few assumptions about the distribution of noise.