Most current visual object detection methods (as of 2012) are
features are detected in an image and those features are combined in abag of visual words'.
Learning algorithms are applied to learn to classify such bags of words.⇒
Mühling et al. present an audio-visual video concept detection system. Their system extracts visual and auditory bags of words from video data. Visual words are based on SIFT features, auditory words are formed by applying the K-Means algorithm to a Mel-Frequency Cepstral Coefficients analysis of the auditory data. Support vector machines are used for classification.⇒
Stroop presented color words which were either presented in the color they meant (congruent) or in a different (incongruent) color. He asked participants to name the color in which the words were written and observed that participants were faster in naming the color when it was congruent than when it was incongruent with the meaning of the word.⇒
The Stroop test has been used to argue that reading is an automatic task for proficient readers.⇒
Greene and Fei-Fei show in a Stroop-like task that scene categorization is automatic and obligatory for simple (`entry-level') categories but not for more complex categories.⇒
Kleesiek et al. use a recurrent neural network with parametric bias (RNNPB) to classify objects from the multisensory percepts induced by interacting with them.⇒
If we want to learn classification using backprop, we cannot force our network to create binary output because binary output is not a smooth function of the input.
Instead we can let our network learn to output the log probability for each class given the input.⇒
Zhang et al. propose an unsupervised dimensionality reduction algorithm, which they call 'multi-modal'.
Their notion of multi-modality is a different notion from the one used in my work: it means that a latent, low-dimensional variable is expressed according to a multi-modal PDF.
This is can be difficult depending the transformation function mapping the high-dimensional data into low-dimensional space. Especially linear methods, like PCA will suffer from this.
The authors focus on (mostly binary) classification. In that context, multi-modality requires complex decision boundaries.⇒