Show Tag: reward-mediated-learning

Select Other Tags

Reward mediated learning has been demonstrated in adaptation of orienting behavior.

Possible neurological correlates of reward-mediated learning have been found.

Reward-mediated is said to be biologically plausible.

Rucci et al. present an algorithm which performs auditory localization and combines auditory and visual localization in a common SC map. The mapping between the representations is learned using value-dependent learning.

Chalk et al. hypothesize that biological cognitive agents learn a generative model of sensory input and rewards for actions.

Soltani and Wang propose a learning algorithm in which neurons predict rewards for actions based on individual cues. The winning neuron stochastically gets reward depending on the action taken.

One of the benefits of Soltani and Wang's model is that it does not require their neurons to perform complex computations. By simply counting active synapses, they calculate log probabilities of reward. The learning rule is what makes sure the correct number of neurons are active given the input.

In Chalk et al.'s model, low-level sensory neurons are responsible for calculating the probabilities of high-level hidden variables given certain features being present or not. Other neurons are then responsible for predicting the rewards of different actions depending on the presumed state of those hidden variables.

In Chalk et al.'s model, neurons update their parameters online, ie. during the task. In one condition of their experiments, only neurons predicting reward are updated, in others, perceptual neurons are updated as well. Reward prediction was better when perceptual responses were tuned as well.

Chen et al. presented a system which uses a SOM to cluster states. After learning, the SOM units are extended with a histogram keeping the number of times the unit was BMU and the input belonged to each of a number of known states $$C={c_1,c_2,\dots,c_n}$$.

The system is used in robot soccer. Each class is connected to an action. Actions are chosen by finding the BMU in the net and selecting the action connected to its most likely class.

In an unsupervised, online phase, these histograms are updated in a reinforcement-learning fashion: whenever the action selected lead to success, the bin in the BMU's histogram which was the most likely class is increased. It is decreased otherwise.

Weisswange et al. model learning of multisensory integration using reward-mediated / reward-dependent learning in an ANN, a form of reinforcement learning.

They model a situation similar to the experiments due to Neil et al. and Körding et al. in which a learner is presented with visual, auditory, or audio-visual stimuli.

In each trial, the learner is given reward depending on the accuracy of its response.

In an experiment where stimuli could be caused by the same or different sources, Weisswange found that their model behaves similar to both model averaging or model selection, although slightly more similar to the former.

Chalk et al. argue that changing the task should not change expectations—change the prior—about the state of the world. Rather, they might change the model of how reward depends on the state of the world.