Show Tag: bayes

Select Other Tags

In some audio-visual time discrimination tasks, humans do not integrate information optimally.

Soltani and Wang propose an adaptive neural model of Bayesian inference neglecting any priors and claim that it is consistent with certain observations in biology.

Soltani and Wang propose an adaptive model of Bayesian inference with binary cues.

In their model, a synaptic weight codes for the ratio of synapses in a set which are activated vs. de-activated by the binary cue encoded in their pre-synaptic axon's activity.

The stochastic Hebbian learning rule makes the synaptic weights correctly encode log posterior probabilities and the neurons will encode reward probability correctly.

Bayesian models have been used to model natural cognition.

Behrens et al. modeled learning of reward probabilities using a the model of a Bayesian learner.

Behrens et al. found that humans take into account the volatility of reward probabilities in a reinforcement learning task.

The way they took the volatility into account was qualitatively modelled by a Bayesian learner.

The theoretical accounts of multi-sensory integration due to Beck et al. and Ma et al. do not learn and leave little room for learning.

Thus, they fail to explain an important aspect of multi-sensory integration in humans.

The Kalman filter assumes linear dynamics (state update) and Gaussian noise.

The extended Kalman filter results from local linearlization of update dynamics.

Particle filters are a numeric Monte-Carlo solution to recursive Bayesian filtering which address problems with non-Gaussian posteriors.

Should models be informed by normative theories like Bayesian or decision theory?

Deneve et al. propose a recurrent network which is able to fit a template to (Poisson-)noisy input activity, implementing an estimator of the original input. The authors show analytically and in simulations that the network is able to approximate a maximum likelihood estimator. The network’s dynamics are governed by divisive normalization and the neural input tuning curves are hard-wired.

Chalk et al. hypothesize that biological cognitive agents learn a generative model of sensory input and rewards for actions.

Soltani and Wang propose a learning algorithm in which neurons predict rewards for actions based on individual cues. The winning neuron stochastically gets reward depending on the action taken.

One of the benefits of Soltani and Wang's model is that it does not require their neurons to perform complex computations. By simply counting active synapses, they calculate log probabilities of reward. The learning rule is what makes sure the correct number of neurons are active given the input.

Cuijpers and Erlhagen use neural fields to implement Bayes' rule for combining the activities of neural populations spatially encoding probability distributions.

Beck et al. argue that simply adding time point-to-time point responses of a population code will integrate the information optimally if the noise in the input is what they call "Poisson-like".

That is somewhat expected as in a Poisson distribution with mean $\lambda$ the variance is $\lambda$ and the standard deviation is $\sqrt{\lambda}$ and adding population responses is equivalent to counting spikes over a longer period of time, thus increasing the mean of the distribution.

Many models of Bayesian integration of neural responses rely on hand-crafted connectivity.

Lee and Mumford interpret the visual pathway in terms of Bayesian belief propagation: each stage in the processing uses output from the one further up as contextual information and output from the one further down as evidence to update its belief and corresponding output.

Each layer thus calculates probabilities of features of the visual display given noisy and ambiguous input.

Lee and Mumford state that their dynamic, recurrent Bayesian model of the visual pathway in its simple form is prone to running into local maxima (states in which small changes in belief in any of the processing stages decrease the joint probability, although a greater changes would increase it).

They propose particle filtering as a solution which they describe as maintaining a number of concurrent high-likelihood hypotheses instead of going for the maximum likelihood one.

Is MLE just a particle filter with a number of particles of one?

Lee and Mumford link their theory to resonance and predictive coding.

Ursino et al. divide models of multisensory integration into three categories:

  1. Bayesian models (optimal integration etc.),
  2. neuron and network models,
  3. models on the semantic level (symbolic models).

Roach et al. present a Bayesian model of multisensory integration which takes into account the fact that information from different modalities is only integrated up to a certain amount of incongruence. That model incorporates a Gaussian prior on distances between actual components in cross-sensory stimuli.

With appropriate parameterization, the model proposed by Roach et al. should produce results much like model selection. It is mathematically a little simpler because no explicit decision needs to be made. However, the motivation of a Gaussian function for modeling the actual distance between components in cross-sensory stimuli is a bit unclear: Either the two components belong to a common source or they do not. Why should independent auditory and visual stimuli have a tendency to be close together?

A deep SC neuron which receives enough information from one modality to reliably determine whether a stimulus is in its receptive field does not improve its performance much by integrating information from another modality.

Patton et al. use this insight to explain the diversity of uni-sensory and multisensory neurons in the deep SC.

Sanchez-Riera et al. use the Bayesian information criterion to choose the number of speakers in their audio-visual active speaker localization system.

Chen et al. presented a system which uses a SOM to cluster states. After learning, the SOM units are extended with a histogram keeping the number of times the unit was BMU and the input belonged to each of a number of known states $$C={c_1,c_2,\dots,c_n}$$.

The system is used in robot soccer. Each class is connected to an action. Actions are chosen by finding the BMU in the net and selecting the action connected to its most likely class.

In an unsupervised, online phase, these histograms are updated in a reinforcement-learning fashion: whenever the action selected lead to success, the bin in the BMU's histogram which was the most likely class is increased. It is decreased otherwise.

Ma, Beck, Latham and Pouget argue that optimal integration of population-coded probabilistic information can be achieved by simply adding the activities of neurons with identical receptive fields. The preconditions for this to hold are

  • independent Poisson (or other "Poisson-like") noise in the input
  • identically-shaped tuning curves in input neurons
  • a point-to-point connection from neurons in different populations with identical receptive fields to the same output neuron.

There's a difference between showing that an instance of sensorimotor processing behaves like a Bayesian model and saying it is optimal:

The Bayesian model uses the information it has optimally, but this does not mean that it uses the right kind of information.

Bayesian information processing does not represent and manipulate unitary variables but PDFs over variables.

According to Knill and Pouget, being an optimal Bayesian observer only means to take into account the uncertainty of the available information (in the system—that's after lossy transformation from physical stimuli to neural representations).

In Anastasio et al.'s model of multi-sensory integration in the SC, an SC neuron is connected to one neuron from each modality whose spiking behavior is a (Poisson) probabilistic function of whether there is a target in that modality or not.

Their single SC neuron then computes the posterior probability of there being a target given its inputs (evidence) and the prior.

In Anastasio et al. use their model to explain enhancement and the principle of inverse effectiveness.

Körding and Wolpert showed that their subjects correctly learned the distribution of displacement of the visual feedback wrt. the actual position of their hand and used it in the task consistent with a Bayesian cue integration model.

The model due to Ma et al. is simple and it requires no learning.

Weisswange et al. distinguish between two strategies for Bayesian multisensory integration: model averaging and model selection.

The model averaging strategy computes the posterior probability for the position of the signal source, taking into account the possibility that the stimuli had the same source and the possibility that they had two distinct sources.

The model selection strategy computes the most likely of these two possibilities. This has been called causal inference.

Weisswange et al. model learning of multisensory integration using reward-mediated / reward-dependent learning in an ANN, a form of reinforcement learning.

They model a situation similar to the experiments due to Neil et al. and Körding et al. in which a learner is presented with visual, auditory, or audio-visual stimuli.

In each trial, the learner is given reward depending on the accuracy of its response.

In an experiment where stimuli could be caused by the same or different sources, Weisswange found that their model behaves similar to both model averaging or model selection, although slightly more similar to the former.

Colonius and Diederich argue that deep-SC neurons spiking behavior can be interpreted as a vote for a target rather than a non-target being in their receptive field.

This is similar to Anastasio et al.'s previous approach.

There are a number of problems with Colonius' and Diederich's idea that deep-SC neurons' binary spiking behavior can be interpreted as a vote for a target rather than a non-target being in their RF. First, these neurons' RFs can be very broad, and the strength of their response is a function of how far away the stimulus is from the center of their RFs. Second, the response strength is also a function of stimulus strength. It needs some arguing, but to me it seems more likely that the response encodes the probability of a stimulus being in the center of the RF.

Colonius and Diederich argue that, given their Bayesian, normative model of neurons' response behavior, neurons responding to only one sensory modality outperform neurons responding to multiple sensory modalities.

Colonius' and Diederich's explanation for uni-sensory neurons in the deep SC has a few weaknesses: First, they model the input spiking activity for both the target and the non-target case as Poisson distributed. This is a problem, because the input spiking activity is really a function of the target distance from the center of the RF. Second, they explicitly model the probability of the visibility of a target to be independent of the probability of its audibility.

If SC neurons spiking behavior can be interpreted as a vote for a target rather than a non-target being in their receptive field, then the decisions must be made somewhere else because they then do not take into account utility.

Wozny et al. distinguish between three strategies for multisensory integration: model averaging, model selection, and probability matching.

Wozny et al. found in an audio-visual localization experiment that a majority of their participants' performance was best explained by the statistically sub-optimal probability matching strategy.

Weisswange et al.'s results seem at odds with those of Wozny et al. However, Wozny et al. state that different strategies may be used in different settings.

Sato et al. modeled multisensory integration with adaptation purely computationally. In their model, two localizations (one from each modality) were bound or not bound and localized according to a maximum a-posteriory decision rule.

The unity assumption can be interpreted as a prior (if interpreted as an expectation of a forthcoming uni- or cross-sensory stimulus) or a mediator variable in a Bayesian inference model of multisensory integration.

Antonelli et al. use Bayesian and Monte Carlo methods to integrate optic flow and proprioceptive cues to estimate distances between a robot and objects in its visual field.

Bayesian models cannot explain why natural cognition is not always optimal or predict bahavior in cases when it is not.

Purely computational, Bayesian accounts of cognition are underconstrained.

Without constrains from ecological and biological (mechanistic) knowledge, computational and evolutionary accounts of natural cognition run the risk of finding optimality wherever they look, as there will always be some combination of model and assumptions to match the data.

Bounded rationality, the idea that an organism may be as rational as possible given its limitations, can be useful, but it is prone to producing tautologies: Any organism is as rational as it can be given its limitations if those limitations are taken to be everything that limits its rationality.

Jones and Love propose three ways of `Bayesian Enlightenment'.

Bayesian theory can be used to describe hypotheses and prior beliefs. These two can then be tested against actual behavior.

In contrast with `Bayesian Fundamentalism', this approach views prior and hypotheses as the scientific theory to be tested as opposed to the only (if handcrafted) way to describe the situation, which is used to see whether once again optimality can be demonstrated.

There can be situations where my algorithm is still optimal or near-optimal.

Some models view attentional changes of neural responses as the result of Bayesian inference about the world based on changing priors.

Chalk et al. argue that changing the task should not change expectations—change the prior—about the state of the world. Rather, they might change the model of how reward depends on the state of the world.

Anastasio et al. present a model of the response properties of multi-sensory SC neurons which explains enhancement, depression, and super-addititvity using Bayes' rule: If one assumes that a neuron integrates its input to infer the posterior probability of a stimulus source being present in its receptive field, then these effects arise naturally.

Anastasio et al.'s model of SC neurons assumes that these neurons receive multiple inputs with Poisson noise and apply Bayes' rule to calculate the posterior probability of a stimulus being in their receptive fields.

Anastasio et al. point out that, given their model of SC neurons computing the probability of a stimulus being in their RF with Poisson-noised input, a sigmoid response function arises for uni-sensory input.

Alais and Burr found in an audio-visual localization experiment that the ventriloquism effect can be interpreted by a simple cue weighting model of human multi-sensory integration:

Their subjects weighted visual and auditory cues depending on their reliability. The weights they used were consistent with MLE. In most situations, visual cues are much more reliable for localization than are auditory cues. Therefore, a visual cue is given so much greater weight that it captures the auditory cue.

Human performance in combining slant and disparity cues for slant estimation can be explained by (optimal) maximum-likelihood estimation.

Deneve describes neurons as integrating probabilities based on single incoming spikes. Spikes are seen as outcomes of Poisson processes and neurons are to infer the hidden value of those processes' parameter(s). She uses the leaky integrate-and-fire neuron as the basis for her model.

Deneve models a changing world; hidden variables may change according to a Marcov chain. Her neural model deals with that. Wow.

Hidden variables in Deneve's model seem to be binary. Differences in synapses (actually, their input) are due to weights describing how `informative' of the hidden variable they are.

Leakiness of neurons in Deneve's model are due to changing world conditions.

Neurons in Deneve's model actually generate Poisson-like output themselves (though deterministically).

The process it generates is described as predictive. A neuron $n_1$ fires if the probability $P_1(t)$ estimated by $n_1$ based on its input is greater than the probability $P_2(t)$ estimated by another neuron $n_2$ based on $n_1$'s input.

The components of `Bayesian Fundamentalist's' psychological models critically are not assumed to correspond to anything in the subject's mind.

Yu and Dayan distinguish between two kinds of uncertainty in perceptual processing: expected uncertainty and unexpected uncertainty.

Expected uncertainty is due to known unreliability in information sources.

Unexpected uncertainty is due to information sources being unreliable unexpectedly.

Yu and Dayan argue that uncertainty should suppress top-down, context-dependent factors in inference, and strengthen learning about the situation.

Yu and Dayan interpret experiments showing that the level of acetylcholine (ACh) increases with learned stochasticity of cues as supporting their theory that ACh signals expected uncertainty.

Yu and Dayan interpret experiments showing that increased levels of norapinephrine (NE) accelerates the detection of changes in cue predictivity as supporting their theory that NE signals unexpected uncertainty.

Yu and Dayan propose a model of inference and learning in which expected uncertainty is encoded by high acetylcholine (ACh) levels and unexpected uncertainty is encoded by norapinephrine (NE).

In many instances of multi-sensory perception, humans integrate information optimally.

Empirical Bayes methods estimate the prior from the data.

More formally, they choose some parametric form for the prior, and estimate an optimal set of parameters $\theta_{opt}$ by optimizaton: $$\theta_{opt} = \mathrm{arg\;max}_\theta\prod_n\int P_\theta(x)P(m_n\mid x)\;dx,$$ for measurements $m_n$ and possible latent variable values $x$.

In predictive coding, a model iterates the following steps:

  • assume values for latent variables,
  • predict sensory input (through a generative model),
  • observe prediction error,
  • adapt assumptions to minimize the error.

The EM algorithm is an iterative algorithm that solves a simplified version of Empirical Bayes.

Friston's predictive coding model predicts a hierarchical cortical system.

Statistical decision theory and Bayesian estimation are used in the cognitive sciences to describe performance in natural perception.

A best estimator wrt. some loss function is an estimator that minimizes the average value of that loss function.

Given probability density functions (PDF) $P(X)$ and $P(X\mid M)$ for a latent variable $X$ and an observable $M$, an optimal estimator for $X$ wrt. the loss function $F$ is given by $$ f_{opt} = \mathrm{arg min}_f \int P(x) \int P(x\mid m) L(x,f(m))\;dx\;dm $$

A weakness of empirical Bayes is that the prior which explains the data best is "not necessarily the one that leads to the best estimator".

Already von Helmholtz formulated the idea that prior knowledge---or expectations---are fused with sensory information into perception.

This idea is at the core of Bayesian theory.

Although predecessors existed, Bayesian theory became popular in perceptual science in the 1980's and 1990's.

A representation of probabilities is not necessary for optimal estimation.

Weisswange et al. apply the idea of Bayesian inference to multi-modal integration and action selection. They show that online reinforcement learning can effectively train a neural network to approximate a Q-function predicting the reward in a multi-modal cue integration task.

Yamashita et al. argue that, since whether or not two stimuli in different modalities with a certain disparity are integrated depends on the weight profiles in their network, a Bayesian prior is somehow encoded in these weights.

Love and Jones accuse `Bayesian Fundamentalism' of focussing too much on the computational theory and neglecting more biologically constrained levels of understanding cognition.

`Bayesian Fundamentalism', like Behaviorism and evolutionary psychology, explain behavior purely from the point of view of the environment---they completely ignore the inner workings of the organism.

In many Bayesian models, the prior and hypothesis space are solely chosen for the convenience of the modeler, not for their plausibility.

`Fundamentalist Bayesians' posit that they can predict behavior purely on the basis of optimality.

Jones and Love talk about Bayesian theory as psychological theories---not so much as neuroscientific theories... I guess?

Nature has had millions of years to optimize the performance of cognitive systems. It is therefore reasonable to assume that they perform optimally wrt. natural tasks and natural conditions.

Bayesian theory provides a framework to determine optimal strategies. Therefore, it makes sense to operate under the assumption that the processes we observe in nature can be understood as implementations of Bayes-optimal strategies.

Anastasio et al. have come up with a Bayesian interpretation of neural responses to multi-sensory stimuli in the SC. According to their view, enhancement, depression and inverse effectiveness phenomena are due to neurons integrating uncertain information from different sensory modalities.

Deneve describes how neurons performing Bayesian inference on variables behind Poisson inputs can learn the parameters of the Poisson processes in an online variant of the expectation maximization (EM) algorithm.

Deneve associates her EM-based learning rule in Bayesian spiking neurons with spike-time dependent plasticity (stdp)

According to Barber et al., `the original Hopfield net implements Bayesian inference on analogue quantities in terms of PDFs'.

Zhang examines conditions for when Naïve Bayes classifier can be optimal regardless of their inherent limitations, and argues that these conditions are common enough to explain some of their success.

Deneve et al.'s model (2001) does not compute a population code; it mainly recovers a clean population code from a noisy one.