Frassinetti et al. showed that humans detect near-threshold visual stimuli with greater reliability if these stimuli are connected with spatially congruent auditory stimuli (and vice versa).⇒
Integrating information from multiple stimuli can have advantages:
Viola and Jones presented a fast and robust object detection system based on
There are specialized and general approaches to object detection. General approaches are more popular nowadays because it is infeasible to design specialized approaches for the number of visual categories of objects that one may want to detect.⇒
Most current visual object detection methods (as of 2012) are
features are detected in an image and those features are combined in abag of visual words'.
Learning algorithms are applied to learn to classify such bags of words.⇒
Mühling et al. present an audio-visual video concept detection system. Their system extracts visual and auditory bags of words from video data. Visual words are based on SIFT features, auditory words are formed by applying the K-Means algorithm to a Mel-Frequency Cepstral Coefficients analysis of the auditory data. Support vector machines are used for classification.⇒