The International Centre for Numerical Methods in Engineering, a partner centre of the UPC and the Government of Catalonia, and the Media Technology research group of La Salle R&D are participating in the European Union–funded project Extensive Unified-Domain Simulation of the Human Voice (EUNISON), which focuses on developing the first system able to simulate the entire voice organ in a unified way, based on supercomputing capabilities and experimental studies. EUNISON will make it possible to “visualise” the voice and will have applications in a wide range of fields, including speech synthesis, medicine, speech therapy, biomechanics, robotics, and even the creation of new forms of cultural expression.

Speech is what sets us apart from other mammals. It is the basic mechanism that allows us to communicate with other members of our species and our primary connection to the world. In modern society speech plays a vital role. However, the physical processes involved in producing the human voice are highly complex. Now, by applying supercomputing technology and drawing on the results of numerous experimental studies, the project researchers are seeking to simulate the voice organ in a unified way that encompasses everything from basic neuromuscular activation to the sound waves emitted by the lips when a syllable, for example, is pronounced.

For physicians who need to understand how voice problems arise and how they should be diagnosed and treated, an accurate, dynamic visualisation of the biomechanics would lead to significant progress. Students of languages and of vocal arts would also be immensely helped by animated renderings of what actually goes on inside the body during speech and song. A detailed simulation of the human voice will therefore be a valuable tool for science and useful in a broad range of applications.

The Secrets of the Voice
After decades of research, speech technologists are still looking for ways to make artificial voices—such as those used to facilitate man-machine relationships—sound as natural as possible.

The principles of classical physics are the basis for achieving a computational simulation of human voice production. Most current modelling strategies seek primarily to generate only the acoustic signal of the voice, and adopt numerous approximations and simplifications to achieve that goal. For instance, current commercial speech syntheses are not physics-based, but rather resort to elaborate concatenations of pre-recorded audio segments. EUNISON seeks to simulate more directly the physics by which the body produces the voice signal.

The CIMNE team, led by Dr Ramon Codina, is focusing specifically on using large supercomputers to numerically simulate the flow of air that occurs in connection with the generation of sound and of the voice inside the human body, specifically in the vocal tract.

The project, which was launched in March 2013 and is expected to deliver results in about a year, will make it possible to visualise the voice, as well as providing clues as to how it works and how things can go wrong. The long-term goal is to develop a detailed computational model of the human voice that can be controlled with input signals at different levels of representation: topological, neuromotoric and phonetic. As a simulation engine, the system will find applications in diverse fields, including medicine, otolaryngology, speech therapy, speech synthesis, man-machine communication, robotics and biomechanics. The technology could also lead to the emergence of new forms of cultural expression.

The project is seeking to achieve a unified computational domain, in which the dynamics of solids, fluids/gases and acoustic waves are modelled simultaneously. Based on this approach, EUNISON has the potential to produce a major breakthrough. The central computational challenge is to extend the current methodology such that it can deal with solid mechanics, elastic collisions, fluid-structure interactions, aerodynamics and acoustics all at once.

The logistic challenge is to combine knowledge from several fields in order to formulate and simulate the detailed set of structures and control signals that the model requires to behave realistically. The experimental challenge is to perform lab experiments with physical replicas to continually refine the researchers’ understanding of the physics of the phonatory and articulatory processes, while cross-checking and validating numerical simulations.

EUNISON, a project funded under the European Union’s Future and Emerging Technologies (FET) Open Scheme, is headed by the Royal Institute of Technology (KTH) in Stockholm. Apart from the UPC, the other participants are La Salle Campus Barcelona–URL, Friedrich Alexander Universität Erlangen Nürnberg (FAU) in Nuremberg, and the Centre National de la Recherche Scientifique (CNRS) in Grenoble.