Show Tag: statistics

Select Other Tags

"Probably the shortest true statement that can be made about causality and causation is "Empirically observed covariation is a necessary but not sufficient condition for causality". Or perhaps "Correlation is not causation, but it sure is a hint"."

The Hartigans' Dip Statistic measures unimodality in a sample: Specifically, it measures the greatest difference between the empirical cumulative distribution function and that unimodal cumulative distribution function which minimizes that greatest difference.

Akaike's information criterion is strongly linked to information theory and the maximum likelihood principle.

Is it possible to learn the reliability of its sensory modalities from how well they agree with the consensus between the modalities under certain conditions?

Possible conditions:

  • many modalities (what my 2013 model does)
  • similar reliability
  • enough noise
  • enough remaining entropy at the end of learning (worked in early versions of my SOM)

Freeman and Dale discuss three measures for detecting bimodality in an observed probability distribution:

  • The bimodality coefficient (BC),
  • Hartigan's dip statistic (HDS), and
  • Akaike's information criterion between one-component and two-component distribution models (AID).

Measures for detecting bimodality can be used to detect whether psychometric measurements include cases in which behavior was caused by different cognitive processes (like intuitive and rational processing).

According to Freeman and Dale, Hartigan's dip statistic is more robust against skew than either the bimodality coefficent and Akaike's information criterion.

The bimodality coefficient can be unstable with small sample sizes (n<10).

Bimodality measures for probability distributions are affected by

  • distance between modes,
  • proportion (relative gain) of modes, and
  • proportion of skew.

Bimodality measures for probability distributions are affected by

  • distance between modes,
  • proportion (relative gain) of modes, and
  • proportion of skew.

Of the three, Freeman and Dale found distance between modes to have the greatest impact on the measures they chose.

In Freeman and Dale's simulations, Hartigan's dip statistic was the most sensitive in detecting bimodality.

In Freeman and Dale's simulations, Hartigan's dip statistic was strongly influenced by proportion between modes.

In Freeman and Dale's simulations, the bimodality coefficient suffered from interactions between skew and proportion between modes.

According to Freeman and Dale, the bimodality coefficient uses the heuristic that bimodal distributions often are asymmetric which would lead to high skew and low kurtosis.

It therefore makes sense that it may detect false positives for uni-modal distributions with high skew and low kurtosis.

Freeman and Dale `are inclined to recommend' Hartigan's dip statistic to detect bimodality.

Intuitively, Akaike's information criterion between one-component and two-component distribution models (AID) tests whether a one model or another describes the data better, with a penalty for model complexity.

Freeman and Dale found Akaike's information criterion between one-component and two-component distribution models (AID) to be very sensitive to but highly biased towards bimodality.

Pfister et al. recommend using Hartigan's dip statistic and the bimodality coefficient plus visual inspection to detect bimodality.

The KullbackÔÇôLeibler divergence is not symmetric, does not satisfy the triangle inequality and is therefore not a metric.

Kullback-Leibler divergence can be used heuristically as a distance between probability distributions.

Kullback-Leibler divergence $D_{KL}(P,Q)$ between probability distributions $P$ and $Q$ can be interpreted as the information lost when approximating $P$ by $Q$.

Computer simulations have been used as early as at least the 1960s to study problems in statistics.

A latent variable model is a model of the relationship between observables and hidden, latent variables such that

  • the manifestations of the observed variables are the result of the latent variables,
  • the values of the observables are conditionally independent wrt. latent variables.

If the noise in the inputs to my SOM isn't uncorrelated between input neurons, then the SOM cannot properly learn a latent variable model.

There can be situations where my algorithm is still optimal or near-optimal.

A Deep Belief Network is a multi-layered, feed-forward network in which each successive layer infers about latent variables of the input from the output of its preceding layers.

An image is highly salient where

  • there is high contrast,
  • there is high variance,
  • it has distinctive higher-order statistics,
  • there is high local symmetry.

"Natural images are statistically redundant."

It seems as though the primates' trichromatic visual system is well-suited to capture the distribution of colors in natural systems.

LGN cells respond whitened---ie. efficiently---to natural images, but they respond non-white to white noise, eg. They are thus well-adapted to natural images from the efficient coding point of view.

A best estimator wrt. some loss function is an estimator that minimizes the average value of that loss function.

Given probability density functions (PDF) $P(X)$ and $P(X\mid M)$ for a latent variable $X$ and an observable $M$, an optimal estimator for $X$ wrt. the loss function $F$ is given by $$ f_{opt} = \mathrm{arg min}_f \int P(x) \int P(x\mid m) L(x,f(m))\;dx\;dm $$

The SOM can be modified to take into account the variance of the input dimensions wrt. each other.

Zhou et al. use an approach similar to that of Bauer et al. They do not use pairwise cross-correlation between input modalities, but simply variances of individual modalities. It is unclear how they handle the case where one modality essentially becomes ground truth to the algorithm.

Without feedback about ground truth, a system learning from a data set of noisy corresponding values must have at least three modalities to learn their reliabilities. One way of doing this is learning pairwise correlation between modalities. It is not enough to take the best hypothesis on the basis of the currently learned reliability model and use that instead of ground truth to learn the variance of the individual modalities: If the algorithm comes to believe that one modality has near-perfect reliability, then that will determine the next best hypotheses. In effect, that modality will be ground truth for the algorithm and it will only learn how well the others predict it.

A SOM models a population and each unit has a response to a stimulus; it is therefore possible to read out a population code from a SOM. This population code is not very meaningful in the standard SOM. Given a more statistically motivated distance function, the population code can be made more meaningful.

Zhou et al.'s and Bauer et al.'s statistical SOM variants assume Gaussian noise in the input.

Judging by the abstract, von der Malsburg's Democratic Integration does what I believe is impossible: it lets a system learn the reliability of its sensory modalities from how well they agree with the consensus between the modalities.