Audiovisual Speech Inversion
We focus on recovering aspects of vocal tract’s geometry and dynamics from speech, a problem referred to as speech inversion.
We focus on recovering aspects of vocal tract’s geometry and dynamics from speech, a problem referred to as speech inversion.
We have developed highly adaptive multimodal fusion rules based on uncertainty compensation which are compatible with synchronous and asynchronous multimodal interaction architectures.
Detection of perceptually important video events is formulated on the basis of saliency models for the audio, visual and textual information conveyed in a video stream.
We have developed a visual processing framework for hands and head tracking and feature extraction from SL/gesture videos.
We are working on microphone array processing and distant speech recognition, aiming to create hands-free, voice-enabled interfaces for home automation control.