Audiovisual Speech Inversion
We focus on recovering aspects of vocal tract’s geometry and dynamics from speech, a problem referred to as speech inversion.
Audiovisual Speech Recognition
We have developed highly adaptive multimodal fusion rules based on uncertainty compensation which are compatible with synchronous and asynchronous multimodal interaction architectures.
Movie Summarization
Detection of perceptually important video events is formulated on the basis of saliency models for the audio, visual and textual information conveyed in a video stream.
Sign Language Recognition
We have developed a visual processing framework for hands and head tracking and feature extraction from SL/gesture videos.
Microphone array speech processing
We are working on microphone array processing and distant speech recognition, aiming to create hands-free, voice-enabled interfaces for home automation control.