Show Reference: "A scene-associated training method for mobile robot speech recognition in multisource reverberated environments"

A scene-associated training method for mobile robot speech recognition in multisource reverberated environments In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on (2011), pp. 542-549, doi:10.1109/iros.2011.6094669 by Jindong Liu, Edward Johns, Guang-Zhong Yang
@inproceedings{liu-et-al-2011,
    abstract = {In this paper, we present a new technique for social mobile robot speech recognition based on scene-associated training models. The key contribution of the paper is a real-time framework that reduces the effect of room reverberation and ambient noise, a challenging problem in speech recognition. In classical approaches, anechoic sound is used to train the model, with the main focus on removing reverberation or noise from the sound. Our technique differs in that we train a number of speech recognizers directly from the reverberated sound, by associating each recognizer with a unique visual scene, to deal with the varying reverberation properties of different rooms. By extracting local features from a captured image and recognizing a scene, the robot can use the appropriate speech recognizer that is trained for the particular structural properties of that scene. We tested our method by using a baseline speech recognition model ({HTK}) across a variety of rooms and different levels of background noise. The results show that the association between a visual scene and a corresponding speech recognizer greatly improves the robot's speech recognition accuracy, together with increasing the computational speed of recognition, compared to competing techniques.},
    author = {Liu, Jindong and Johns, Edward and Yang, Guang-Zhong},
    booktitle = {Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on},
    doi = {10.1109/iros.2011.6094669},
    institution = {The Hamlyn Centre, Imperial College London, UK},
    isbn = {978-1-61284-454-1},
    issn = {2153-0858},
    keywords = {auditory, biomimetic, cue-combination, multi-modality, visual, visual-processing},
    pages = {542--549},
    posted-at = {2012-08-22 16:37:36},
    priority = {2},
    publisher = {IEEE},
    title = {A scene-associated training method for mobile robot speech recognition in multisource reverberated environments},
    url = {http://dx.doi.org/10.1109/iros.2011.6094669},
    year = {2011}
}

See the CiteULike entry for more info, PDF links, BibTex etc.

Reverberation and noise can degrade speech recognition.

Speech recognition models are usually trained with noise and reverberation free speech samples.

Humans adapt to an auditory scene's reverberation and noise conditions. They use visual scene recognition to recall reverberation and noise conditions of familiar environments.

A system that stores multiple trained speech recognition models for different environments and retrieves them guided by visual scene recognition has improved speech recognition in reverberated and noisy environments.