Show Tag: visual-processing

Select Other Tags

Li et al. present a purely engineering-based approach to active speaker localization. Their system uses Viola and Jones' object detection algorithm for face detection and cross-correlation for auditory speaker localization.

Humans can orient towards emotional human faces faster than towards neutral human faces.

The time it takes to elicit a visual cortical response plus the time to elicit a saccade from cortex (FEF) is longer than the time it takes for humans to orient towards faces.

Nakano et al. take this as further evidence for a sub-cortical (retinotectal) route of face detection.

Patients with lesions in V1 or striate were found to still be able to discriminate gender and expression of faces.

Neurons in the monkey pulvinar react extremely fast to visually perceived faces (50ms).

The superior colliculus does not receive any signals from short-wavelength cones (S-cones) in the retina.

Nakano et al. presented an image of either a butterfly or a neutral or emotional face to their participants. The stimuli were either grayscale or color-scale images, where color-scale images were isoluminant and only varied in their yellow-green color values. Since information from S-cones does not reach the superior colliculus, these faces were presumably only processed in visual cortex.

Nakano et al. found that their participants reacted to gray-scale emotional faces faster than to gray-scale neutral faces and to gray-scale faces faster than to gray-scale butterflies. Their participants reacted somewhat faster to color-scale faces than to color-scale butterflies, but this effect was much smaller than for gray-scale images. Also, the difference in reaction time to color-scale emotional faces was not significantly different from that to color-scale neutral faces.

Nakano et al. take this as further evidence of sub-cortical face detection and in particular of emotional sub-cortical face detection.

Humans can orient towards human faces faster than towards other visual stimuli (within 100ms).

Humans' (and other mammals') brains are devoted to a large part to visual processing.

Vision is an important if not the most important source of sensory input for humans' (and other mammals').

Visual receptive fields in the superficial hamster SC do not vary substantially in RF size with RF eccentricity.

Visual receptive field sizes change in the deep SC with eccentricity, they do not in the superficial hamster SC.

Masking visual face stimuli---ie. presenting faces visually for too short to detect them consciously, then presenting a masking stimulus---can evoke measurable changes in conductance.

Masking visual face stimuli can evoke responses in sc, pulvinar, and amygdala.

The right amygdala responded differently to masked than to unmasked stimuli, while the left did not in Morris et al.'s experiments.

SOM-based algorithms have been used to model several features of natural visual processing.

Miikulainen et al. use their SOM-based algorithms to model the visual cortex.

Miikulainen et al. use a hierarchical version of their SOM-based algorithm to model natural development of visual capabilities.

Retinal waves of spontaneous activity in the retina occur before photoreceptors develop.

They are thought to be involved in setting up the spatial organization of the visual pathway.

LGN and V4 have distinct layers for each eye.

The distinct layers for each eye in LGN and V4 only arise after the initial projections from the retina are made, but, in higher mammals, before birth.

Most neurons in the visual cortex (except v4) are binocular.

Usually, input from one eye is dominant, however.

The distribution of monocular dominance in visual cortex neurons is drastically affected by monocular stimulus deprivation during early development.

Competition appears to be a major factor in organizing the visual system.

Visual capture is weaker for stimuli in the periphery where visual localization is less reliable relative to auditor localization than at the center of the visual field.

Many visual person detection methods use one feature to detect people, create a histogram for the strength of that feature across the image. They then compute a likelihood for a pixel or region by assuming a Gaussian distribution of distances of pixels or histograms belonging to a face. This distribution has been validated in practise (for certain cases).

Feature-based and spatial attention may be based on similar mechanisms.

Spatial attention does not seem to affect the selectivity of visual neurons—just the vigour of their response.

Spatial visual attention increases the activity of neurons in the visual cortex whose receptive fields overlap the attended region.

Feature-based visual attention increases the activity of neurons in the visual cortex which respond to the attended feature.

Spatial and feature-based visual attention are additive: together, they particularly enhance the activity of any neuron whose receptive field encompasses the attended region, contains a stimulus with the attended feature, and prefers that feature.

Response properties in mouse superficial SC neurons are not strongly influenced by experience.

How strongly SC neurons' development depends on experience (and how strongly well they are developed after birth) is different from species to species, so just because the superficial mouse SC is developed at birth, doesn't mean it is in other species (and I believe responsiveness in cats develops with experience).

Response properties of superficial SC neurons is different from those found in mouse V1 neurons.

Response properties of superficial SC neurons are different in different animals.

Search targets which share few features with mutually similar distractors surrounding them are said to `pop out': it seems to require hardly any effort to identify them and search for them is very fast.

Search targets that share most features with their surrounding, on the other hand, require much more time time be identified.

Gottlieb et al. found that the most salient and the most task-relevant visual stimuli evoke the greatest response in LIP.

A traditional model of visual processing for perception and action proposes that the two tasks rely on different visual representations. This model explains the weak effect of visual illusions like the Müller-Lyer illuson on performance in grasping tasks.

Foster et al. challenge the methodology used in a previous study by Dewar and Carey which supports the perception and action model of visual processing due to Goodale and Milner.

They do that by changing the closed visual-action loop in Dewar and Carey's study into an open one by removing visual feedback at motion onset. The result is that the effect of the illusion is there for grasping (which it wasn't in the closed-loop condition) but not (as strongly) for manual object size estimation.

Foster et al. argue that this suggests that the effect found in Dewar and Carey's study is due to continuous visual feedback.

O'Regan and Noë highlight the importance of understanding seeing as an active process, as an exploratory activity.

O'Regan and Noë speak of the geometric laws that govern the relationship between moving the eyes and body and the change of an image in the retina.

The geometry of the changes—straight lines becoming curves on the retina when an object moves in front of the eyes—are not accessible to the visual system, initially, because nothing tells the brain about the spatial relations between photoreceptors in the retina.

Saccades evoked by electric stimulation of the deep SC can be deviated towards the target of visual spatial attention. This is the case even if the task forbids a saccade towards the target of visual spatial attention.

Activation build-up in build-up neurons is modulated by spatial attention.

O'Regan and Noë argue that there is not an illusion that there is a "stable, high-resolution, full field representation of a visual scene" in the brain, but that people have the impression of being aware of everything in the scene.

The difference is that we would not need a photograph-like representation in the brain to be aware of all the details even if we were aware of it.

Localization of audiovisual targets is usually determined more by the location of the visual sub-target than on that of the auditory sub-target.

Especially in situations where visual stimuli are seen clearly and thus localized very easily, this can lead to the so-called ventriloquism effect (aka `visual capture') in which a sound source seems to be localized at the location of the visual target although it is in fact a few degrees away from it.

Lee and Mumford interpret the visual pathway in terms of Bayesian belief propagation: each stage in the processing uses output from the one further up as contextual information and output from the one further down as evidence to update its belief and corresponding output.

Each layer thus calculates probabilities of features of the visual display given noisy and ambiguous input.

Lee and Mumford state that their dynamic, recurrent Bayesian model of the visual pathway in its simple form is prone to running into local maxima (states in which small changes in belief in any of the processing stages decrease the joint probability, although a greater changes would increase it).

They propose particle filtering as a solution which they describe as maintaining a number of concurrent high-likelihood hypotheses instead of going for the maximum likelihood one.

There are cells in the rabbit retina which are selective of direction of motion.

Some visual processing occurs already in the retina.

Neurons at later stages in the hierarchy of visual processing extract very complex features (like faces).

Unilateral lesions in brain areas associated with attention can lead to visuospatial neglect; the failure to consider anything within a certain region of the visual field. In extreme cases this can mean that patients e.g. only read from one side of a book.

Kastner and Ungerleider propose that the top-down signals which lead to the effects of visual attention originate from brain regions outside the visual cortex.

Regions lesions of which can induce visuospatial neglect include

  • the parietal lobe, in particular the inferior part,
  • temporo-parietal junction,
  • the anterior cingulate cortex,
  • basal ganglia,
  • thalamus,
  • the pulvinar nucleus.

Cells in MST respond to and are selective to optic flow.

Viola and Jones presented a fast and robust object detection system based on

  1. a computationally fast way to extract features from images,
  2. the AdaBoost machine learning algorithm,
  3. cascades of weak classifiers with increasing complexities.

Voges et al. use a difference image to detect and localize moving objects (humans).

Voges et al. use the strength of the visual detection signal (the peak value of the column-wise sum of the difference image) as a proxy for the confidence of visual detection.

They use visual localization whenever this signal strength is above a certain threshold, and auditory localization if it is below that threshold.

Aarabi choose (adaptive) difference images for visual localization to avoid relying on domain knowledge.

Aarabi use ITD (computed using cross-correlation) and ILD in an array of 3 microphones for auditory localization.

There are specialized and general approaches to object detection. General approaches are more popular nowadays because it is infeasible to design specialized approaches for the number of visual categories of objects that one may want to detect.

Most current visual object detection methods (as of 2012) are bag-of-visual-words' approaches: features are detected in an image and those features are combined in abag of visual words'. Learning algorithms are applied to learn to classify such bags of words.

Mühling et al. present an audio-visual video concept detection system. Their system extracts visual and auditory bags of words from video data. Visual words are based on SIFT features, auditory words are formed by applying the K-Means algorithm to a Mel-Frequency Cepstral Coefficients analysis of the auditory data. Support vector machines are used for classification.

VIsual attention is the facilitation of visual processing of some stimuli over others.

The number of neurons in the lower stages of the visual processing hierarchy (V1) is much lower than in the higher stages (IT).

Since much of what the visual system does can be seen as compression, since SOMs can do vector quantization (VQ) and since VQ is a compression technique, it makes sense that SOMs have been useful in modeling visual processing.

Different types of retinal ganglion cells project to different lamina in the zebrafish optic tectum.

The lamina a retinal ganglion cell projects to in the zebrafish optic tectum does not change in the fish's early development. This is in contrast with other animals.

However, the position within the lamina does change.

Weir and Suver review experiments on the visual system of flies, specifically into the dendritic and network properties of the VS and HS systems which respond to apparent motion in the vertical and horizontal planes, respectively.

Stroop presented color words which were either presented in the color they meant (congruent) or in a different (incongruent) color. He asked participants to name the color in which the words were written and observed that participants were faster in naming the color when it was congruent than when it was incongruent with the meaning of the word.

The Stroop test has been used to argue that reading is an automatic task for proficient readers.

Greene and Fei-Fei show in a Stroop-like task that scene categorization is automatic and obligatory for simple (`entry-level') categories but not for more complex categories.

People with lesions of the parieto-occipital cortex (POJ) are impaired in reaching and grasping objects in their peripheral visual field (an effect called 'optic ataxia').

Himmelbach et al. studied one patient in a visuo-haptic grasping task and found that she had a healthy-like ability to adapt her grip online to changes of object size when it was in the central viewing field. This indicates that the problem for patients with lesions of parieto-occipital cortex (POJ) is not an inability to adapt online, but more likely the connection between visuomotor pathways and pathways necessary for grasping.

Disparity-selective cells in visual cortical neurons have preferred disparities of only a few degrees whereas disparity in natural environments ranges over tens of degrees.

The possible explanation offered by Zhao et al. assumes that animals actively keep disparity within a small range, during development, and therefore only selectivity for small disparity develops.

Zhao et al. present a model of joint development of disparity selectivity and vergence control.

Zhao et al.'s model develops both disparity selection and vergence control in an effort to minimize reconstruction error.

It uses a form of sparse-coding to learn to approximate its input and a variation of the actor-critic learning algorithm called natural actor critic reinforcement learning algorithm (NACREL).

The teaching signal to the NACREL algorithm is the reconstruction error of the model after the action produced by it.

Task-irrelevant auditory cues have been found to enhance reaction times in others. visual cues, however, which cued visual localization, did not cue auditory localization.

Traditionally, visual attention is subdivided into feature-based attention and spatial attention. However, spatial is arguably only one cue out of possibly a number of cues and possibly only a special case.

There is a distinction between two different kinds of bats: megabats and microbats. Megabats differ in size (generally), but also in the organization of their visual system. In particular, their retinotectal projections are different: while all of the retinotectal projections in microbats are contralateral, retinotectal projections in megabats are divided such that projections from the nasal part of the retina go to the ipsilateral SC and those from the peripheral part go to the contralateral SC. This is similar to primate vision.

In primates, retinotectal projections to each SC are such that each visual hemifield is mapped to one (contralateral) SC. This is in contrast with retinotectal projections in most other vertebrates, where all projections from one retina project to the contralateral SC.

The part of the visual map in the superficial SC corresponding to the center of the visual field has the highest spatial resolution.

Overt visual function occurs only starting 2-3 weeks postnatally in cats.

De Kamps and van der Velde argue for combinatorial productivity and systematicity as fundamental concepts for cognitive representations. They introduce a neural blackboard architecture which implements these principles for visual processing and in particular for object-based attention.

Deco and Rolls introduce a system that uses a trace learning rule to learn recognition of more and more complex visual features in successive layers of a neural architecture. In each layer, the specificity of the features increases together with the receptive fields of neurons until the receptive fields span most of the visual range and the features actually code for objects. This model thus is a model of the development of object-based attention.

Using multiple layers each of which learns with a trace rule with successively larger time scales is similar to the CTRNNs Stefan Heinrich uses to learn the structure of language. Could there be a combined model of learning of sentence structure and language processing on the one hand and object-based visual or multi-modal attention on the other?

Visual receptive fields in the superficial monkey SC do vary substantially in RF size with RF eccentricity.

In some animals, receptive field sizes do and in some they don't change substantially with RF excentricity.

The neurons in the superficial (rhesus) monkey SC do not exhibit strong selectivity for specific shapes, stimulus orientation, or moving directions. Some of them do show selectivity to stimuli of specific sizes.

The activity profiles for stimuli moving through superficial SC neuron RFs shown in Cynader and Berman's work look similar to Poisson-noisy Gaussians, however, the authors state that the strength of a response to a stimulus was the same regardless where in the activating region it was shown.

The neurons in the superficial (rhesus) monkey SC largely prefer moving stimuli over non-moving stimuli.

In the intermediate layers of the monkey SC, neurons have a tendency to reduce or otherwise their reaction to presentations of the same stimulus over time.

There are marked differences in the receptive field properties of superficial cat and monkey SC neurons.

Ocular dominance stripes are stripes in visual brain regions in which retinal projections of one eye or the other terminate alternatingly.

Visual spatial attention

  • lowers the stimulus detection threshold,
  • improves stimulus discrimination,

With two stimuli in the receptive field, one with features of a visual search target and one with different features

  • increases average neural activity in cortex compared to the same two objects without attending to any features
  • decreases average neural activity if spatial attention is on the location of the non-target compared to when it is on the target.

The fact that average neural activity in cortex is decreased if spatial attention is on the location of a non-target out of a target and a non-target compared to when it is on the target supports the notion that inhibition plays an important role in stimulus selection.

Two superimposed visual stimuli of different orientation, one optimal for a given simple cell in visual cortex, the other sub-optimal but excitatory, can elicit a weaker response than just the optimal stimulus.

Seeing someone say 'ba' and hearing them say 'ga' can make one perceive them as saying 'da'. This is called the `McGurk effect'.

Divisive normalization models describe neural responses well in cases of

  • olfactory perception in drosophila,
  • visual processing in retina and V1,
  • possibly in other cortical areas,
  • modulation of responses through attention in visual cortex.

Visual receptive fields in the sc usually consist of an excitatory central region and an inhibitory surround.

(Auditory receptive fields also often seem to show this antagonism.)

Moving eyes, ears, or body changes the receptive field (in external space) in SC neurons wrt. stimuli in the respective modality.

Yang and Shadlen show that neurons in LIP (in monkeys) encode the log probability of reward given artificial visual stimuli in a wheather prediction task experiment.

Before a saccade is made, the region that will be the target of that saccade is perceived with higher contrast and visual contrast.

Newborn children prefer to look at faces and face-like visual stimuli.

Visual cortex is not fully developed at birth in primates.

The fact that visual cortex is not fully developed at birth, but newborn children prefer face-like visual stimuli to other visual stimuli could be explained by the presence of a subcortical face-detector.

The fact that visual cortex is not fully developed at birth, but newborn children prefer face-like visual stimuli to other visual stimuli could be explained by the presence of a subcortical face-detector.

Looking behavior in newborns may be dominated by non-cortical processes.

Different parts of the visual field feed into the cortical and subcortical visual pathways more or less strongly in humans.

The nasal part of the visual field feeds more into the cortical pathway while the peripheral part feeds more into the sub-cortical pathway.

In one experiment, newborns reacted to faces only if they were (exclusively) visible in their peripheral visual field, supporting the theory that the sub-cortical pathway of visual processing plays a major role in orienting towards faces in newborns.

It makes sense that sub-cortical visual processing uses peripheral information more than cortical processing:

  • sub-cortical processing is concerned with latent monitoring of the environment for potential dangers (or conspecifiics)
  • sub-cortical processing is concerned with watching the environment and guiding attention in cortical processing.

Visual processing of potentially affective stimuli seems to be partially innate in primates.

The pulvinar receives direct retinal input.

Backward connections in the visual cortex show less topographical organization (`show abundant axonal bifurcation'), are more abundant than forward connections.

The visual cortex is hierarchically organized.

It seems a bit unclear to me what determines the hierarchy of the visual cortex if backward connections are predominant.

Feedforward connections in the visual cortex seem to be driving while feedback connections seem to be modulatory.

Some authors see the lower stages of visual processing as implementing an inverse model of optics—a model deriving causes from sensations and higher stages as implementing a forward model—a model generating expected sensations from assumed causes.

There is an illusion that there is a "stable, high-resolution, full field representation of a visual scene" in the brain.

Could the illusion that there is a "stable, high-resolution, full field representation of a visual scene" in the brain be the result of the availability heuristic? Whenever we are interested in some point in a visual scene, it is either at the center of our vision anyway, or we saccade to it. In both cases, detailed information of that scene is available almost instantly.

This seems to be what O'Regan and Noë imply (although they do not talk about the availability heuristic).

Feldman gives a functional explanation of the stable world illusion, but he does not seem to explain "Subjective Unity of Perception".

Feldman states that enough is known about what he calls "Visual Feature Binding", so as not to call it a problem anymore.

Feldman explains Visual Feature Binding by the fact that all the features detected in the fovea usually belong together (because it is so small), and through attention. He cites Chikkerur et al.'s Bayesian model of the role of spatial and object attention in visual feature binding.

Feldman states that "Neural realization of variable binding is completely unsolved".

Neurons at low stages in the hierarchy of visual processing extract simple, localized features.

Color opponency and center-surround oppenency arise first in LGN.

The visual system (of primates) contains a number of channels for different types of visual information:

  • color
  • shape
  • motion
  • texture
  • 3D

Separating visual processing into channels by the kind of feature it is based on is beneficial for efficient coding: feature combinations can be coded combinatorially.

There are very successful solutions to isolated problems in computer vision (CV). These solutions are flat, however in the sense that they are implemented in a single process from feature extraction to information interpretation. A CV system based on such solutions can suffer from redundant computation and coding. Modeling a CV

Nearly all projections from the retinae go through LGN.

All visual areas from V1 to V2 and MT are retinotopic.

The ventral pathway of visual processing is weakly retinotopically organized.

The complexity of features (or combinations of features) neurons in the ventral pathway react to increases to object level. Most neurons react to feature combinations which are below object level, however.

The dorsal pathway of visual processing consists of areas MST (motion area), and visual areas in the posterior parietal cortex (PPC).

The complexity of motion patterns neurons in the dorsal pathway are responsive to increases along the pathway. This is similar to neurons in the ventral pathway which are responsive to progressively more complex feature combinations.

Receptive fields in the dorsal pathway of visual processing are less retinotopic and more head-centered.

Parvocellular ganglion cells are color sensitive, have small receptive fields and are focused on foveal vision.

Magnocellular ganglion cells have lower spatial and higher temporal resolution than parvocellular cells.

There are shortcuts between the levels of visual processing in the visual cortex.

Certain neurons in V1 are sensitive to simple features:

  • edges,
  • gratings,
  • line endings,
  • motion,
  • color,
  • disparity

Certain receptive fields in the cat striate cortex can be modeled reasonably well using linear filters, more specifically Gabor filters.

Simple cells are sensitive to the phase of gratings, whereas complex cells are not and have larger receptive fields.

Some cells in V1 are sensitive to binocular disparity.

The receptive fields of LGN cells can be described as either an excitatory area inside an inhibitory area or the reverse.

The receptive field properties of neurons in the cat striate cortex have been modeled as linear filters. In particular three types of linear filters have been proposed:

  • Gabor filters,
  • filters that based on second differentials of Gaussians functions,
  • difference of Gaussians filters.

Hawken and Parker studied the response patterns of a large number of cells in the cat striate cortex and found that Gabor filters, filters which are second differential of Gaussian functions, and difference-of-Gaussians filters all model these response patterns well, quantitatively.

They found, however, that difference-of-Gaussians filters strongly outperformed the other models.

Difference-of-Gaussians filters are parsimonious candidates for modeling the receptive fields of striate cortex cells, because the kind of differences of Gaussians used in striate cortex (differences of Gaussians with different peak locations) can themselves be computed linearly from differences of Gaussians which model receptive fields of LGN cells (where the peaks coincide), which provide the input to the striate cortex.

Both simple and complex cells' receptive fields can be described using difference-of-Gaussians filters.

"Natural images are statistically redundant."

It seems as though the primates' trichromatic visual system is well-suited to capture the distribution of colors in natural systems.

By optimizing sparseness (or coding efficiency) of functions for representing natural images, one can arrive at tuning functions similar to those found in in simple cells. They are

  • spatially localized
  • oriented
  • band-pass filters with different spatial frequencies.

LGN cells respond whitened---ie. efficiently---to natural images, but they respond non-white to white noise, eg. They are thus well-adapted to natural images from the efficient coding point of view.

One hypothesis about early visual processing is that it tries to preserve (and enhance) as much information about the visual stimuli (with as little effort) as possible. Findings about efficiency in visual processing seem to validate this hypothesis.

A complete theory of early visual processing would need to address more aspects than coding efficiency, optimal representation and cleanup. Tasks and implementation would have to be taken into account.

Pulvinar neurons seem to receive input and project to different layers in visual cortex:

They receive input from layer 5 and project to layers one and three.

Connectivity between pulvinar and MT is similar to connectivity between pulvinar and visual cortex.

Saccade targets tend to be the centers of objects.

When reading, preferred viewing locations (PVL)—the centers of the distributions of fixation targets---are typically located slightly left of the center of words.

When reading, the standard deviation of the distribution of fixation targets within a word increases with the distance between the start and end of a saccade.

Pajak and Nuthmann found that saccade targets are typically at the center of objects. This effect is strongest for large objects.

Early visual neurons (eg. in V1) do not seem to encode probabilities.

I'm not so sure that early visual neurons don't encode probabilities. The question is: which probabilities do they encode? That of a line being there?

The eye suffers from

  • chromatic aberration
  • optical imperfections
  • the fact that photo receptors are behind ganglia and blood vessels

The optic nerve does not have the bandwidth to transmit all the light receptors' activities. Some compression occurs already in the eye.

Short-range inhibition happens in the horseshoe crab compound eye: neighbouring receptor units inhibit each other.

Ganglion cells in the retina connect the brain to a small, localized number of photoreceptors. The small population—or the region in space from which it receives incoming light— are called a ganglion cell's receptive field. They respond best either to patterns of high luminance in the center of that small population and low luminance at its periphery, or to the opposite pattern. Ganglion cells with the former characteristics are called "on-center" cells, the others "off-center" cells.

Contrast sensitivity is an important feature of early visual processing.

More visual processing tends to occur in the retina the more important the result is (like detecting bugs for frogs or detecting foxes for rabbits) and the less complex the organism (like frogs and foxes).

Spatial frequency carries a lot of information about a visual image.

Magnocellular ganglion cells have large receptive fields.

The M-stream of visual processing is formed by magnocellular ganglion cells, the P-stream by parvocellular ganglion cells.

The M-stream is thought to deal with motion detection and analysis, while the P-stream seems to do be involved in processing color and form.

The part of the visual cortex dedicated to processing signals from the fovea is much greater than that dealing with peripheral signals.

LGN receives more feedback projections from V1 than forward connections from the retina.

Cells in inferotemporal cortex are highly selective to the point where they approach being grandmother cells.

There are cells in inferotemporal cortex which respond to (specific views on / specific parts of) faces, hands, walking humans and others.

Certain ganglion cells in the frog retina, dubbed `bug detectors', react exclusively to bug-like stimuli and their activity provokes bug-catching behavior in the frog.

The underlying task in vision is to "reliably derive properties of the world from images of it".

In the pop-out condition of a visual search task, Buschman and Miller found that neurons in the posterior parietal cortex region LIP found the search target earlier than neurons in frontal cortex regions FEF and LPFC.

In the pure visual search condition of a visual search task, Buschman and Miller found that neurons in frontal cortex regions FEF and LPFC found the search target earlier than neurons in the posterior parietal cortex region LIP.

Low-level (dis)similarity is important for top-down visual search.

The biased competition theory of visual attention explains attention as the effect of low-level stimuli competing with each other for resources—representation and processing. According to this theory, higher-level processes/brain regions bias this competition.

Predictive coding and biased competition are closely related concepts. Spratling combines them in his model and uses it to explain visual saliency.

Desimone and Duncan argue that spatial information about a search target can be part of the attentional template fitted against all potential targets in the visual display as any other object feature.

Mishkin et al. proposed a theory suggesting that visual processing runs in two pathways: the what' and thewhere' pathway.

The `what' pathway runs ventrally from the striate and prestriate to the inferior temporal cortex. This pathway is supposed to deal with the identification of objects.

The `where' pathway runs dorsally from striate and prestriate to inferior parietal pathway. This pathway is supposed to deal with the localization of objects.

Mishkin et al. already recognized the question of how and where the information carried in the different pathways could be integrated. They speculated that some of the targets of projections from the pathways, eg. in the limbic or system or the frontal lobe, could be convergence sites. Mishkin et al. stated that some preliminary results suggest that the hippocampal formation might play an important role.

The ventriloquism aftereffect occurs when an auditory stimulus is initially presented together with a visual stimulus with a certain spatial offset.

The auditory stimulus is typically localized by subjects at the same position as the visual stimulus, and this mis-localization prevails even after the visual stimulus disappears.

The SOM has ancestors in von der Malsburg's "Self-Organization of Orientation Sensitive Cells in the Striate Cortex" and other early models of self-organization

Von der Malsburg introduces a simple model of self-organization which explains the organization of direction-sensitive cells in the human visual cortex.

Pitti et al. use a Hebbian learning algorithm to learn somato-visual register.