Assistant Professor Miguel Carreira-Perpiñán, Ph.D., has received one of the National Science Foundation's prestigious Early Career Development (CAREER) Awards for research and education on machine learning applied to the problem of articulatory inversion.
The Early Career Development Award is one of the NSF's most prestigious, and is accompanied by a five-year grant of $100,000 per year. It is designed to provide support for those teacher-scholars "who most effectively integrate research and education within the context of the mission of their organization." Given relatively early in a career, the awards are meant to lay a foundation for a lifetime of making significant contributions in integrating research with education.
Carreira-Perpiñán received the Award for basic research and education in machine learning applied to the specific problem of articulatory inversion: recovering the sequence of vocal tract shapes that produce a given acoustic utterance. The "forward" problem--determining what acoustic utterance will result from a given sequence of vocal tract shapes--is relatively simple, Carreira-Perpiñán said, "because the position and movements of the vocal tract cause a unique acoustic signal." But reversing that process, by starting with a human speech sample and working backwards to figure out the configuration of the vocal tract, has remained an unsolved problem since the 1960s.
"The problem is extremely difficult," Carreira-Perpiñán said, "because different vocal tract shapes can produce the same acoustics--think of the way a ventriloquist works, for instance--but the temporal sequence of those shapes has to obey mechanical constraints, avoiding things like jerky movements." In what the NSF has deemed to be a promising new approach to the problem, Carreira-Perpiñán's research uses new models and algorithms as well as advances in dimensionality reduction, density estimation and regularization.
Carreira-Perpiñán believes that part of the NSF's rationale for his award is the extremely broad applicability of any solution to the problem. "The range of uses for this work is just tremendous," he said, "with applications in everything from automatic speech recognition to cartoon animation." Other types of audiovisual integration where this approach would prove useful include human and automated lip reading, facial animation in human-computer interfaces, computer-aided instruction, video games, multimedia telephony, lip synchronization, joint audio-video coding, bimodal speaker verification, and multimodal speech perception modeling. In addition, the approach would be applicable to problems not related to speech, such as inverse kinematics of robotic arms, or recovering 3D movement from a two-dimensional video starting point.
At OGI, Carreira-Perpiñán's work will have immediate applicability in research conducted at the Center for Spoken Language Understanding (CSLU). CSLU Director Jan van Santen, who is also chair of OGI's Department of Computer Science and Electrical Engineering (CSEE), will advise Carreira-Perpiñán on aspects of his research related to dysarthric speech. In NIH-funded research, CSLU is exploring ways of improving diagnosis of dysarthria, which may ultimately serve as a predictor for mild cognitive impairment in the elderly and allow for improved treatment. "Miguel's research has the potential to significantly improve our ability to diagnose dysarthria," said van Santen.
Carreira-Perpiñán has been an Assistant Professor in OGI's Department of Computer Science and Electrical Engineering since 2004.