Learning neural network-based speech representations for voice characterization and speaker recognition

Learning neural network-based speech representations for voice characterization and speaker recognition

By: 
Jean-François Bonastre
Laboratoire d'Informatique Avignon (LIA), Université d´Avignon, France
Date: 
Jul 04th
Jean-François Bonastre obtained a PhD in 1994 on the automatic recognition of the speaker, as well as a Habilitation to direct the Research in 2000. He joined the Institut Universitaire de France in 2006, as a Junior member. From 2008 to 2015, he was Vice President of the Board of the University of Avignon. He was also President of the International Speech Communication from 2011 to 2013, as well as the Francophone Association of Communication Spoken from 2000 to 2004. He was an elected member of the IEEE Speech and Language Technical Committee and a member of the IEEE Biometrics council. for two years as well. He spent a sabbatical at Panasonic Speech Technology Laboratory (Santa Barbara) for one year (2002-2003). He is author or co-author of more than 140 articles and three patents.
In the neural network galaxy, the large majority of approaches and research effort is dedicated to defined tasks, like recognize an image of a cat or discriminate noise versus speech records. For these kind of tasks, it is easy to write a labeling reference guide in order to obtain training and evaluation data with a ground truth. But for a large set of high level human tasks, and particularly for tasks related to the artistic field, the task itself is not easy to define, only the result is known, and it is difficult or impossible to write such a labeling book. We name this kind of problem as ”Underdefined task”. In this presentation, a methodology based on representation learning is proposed to tackle this class of problems and a practical example is shown in the domain of voice casting for voice dubbing.