Show Reference: "Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning"

Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning In Advances in Multimedia Modeling, Vol. 7131 (2012), pp. 40-50, doi:10.1007/978-3-642-27355-1_7 by Markus Mühling, Ralph Ewerth, Jun Zhou, Bernd Freisleben edited by Klaus Schoeffmann, Bernard Merialdo, AlexanderG Hauptmann, et al.
@incollection{muehling-et-al-2012,
abstract = {State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients ({MFCC}) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words ({BoAW}) approach that models {MFCC} features in an auditory vocabulary. The resulting {BoAW} features are combined with state-of-the-art visual features via multiple kernel learning ({MKL}). Experiments on a large set of 101 video concepts from the {MediaMill} Challenge show the effectiveness of using {BoAW} features: The system using {BoAW} features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via {MKL} yields a relative performance improvement of 9\%.},
author = {M\"{u}hling, Markus and Ewerth, Ralph and Zhou, Jun and Freisleben, Bernd},
booktitle = {Advances in Multimedia Modeling},
citeulike-article-id = {13550715},
doi = {10.1007/978-3-642-27355-1\_7},
editor = {Schoeffmann, Klaus and Merialdo, Bernard and Hauptmann, AlexanderG and Ngo, Chong-Wah and Andreopoulos, Yiannis and Breiteneder, Christian},
keywords = {bag-of-words, learning, multi-modality, objects},
pages = {40--50},
posted-at = {2015-03-17 08:11:15},
priority = {2},
publisher = {Springer Berlin Heidelberg},
series = {Lecture Notes in Computer Science},
title = {Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning},
url = {http://dx.doi.org/10.1007/978-3-642-27355-1\_7},
volume = {7131},
year = {2012}
}



Most current visual object detection methods (as of 2012) are bag-of-visual-words' approaches: features are detected in an image and those features are combined in abag of visual words'. Learning algorithms are applied to learn to classify such bags of words.