With more than 6900 languages in the world and the need to support multiple input and output languages, the most important challenge today is to port or adapt speech processing systems to new languages rapidly and at reasonable costs. Major bottlenecks are the sparseness of speech and text data, the lack of language conventions, and the gap between technology and language expertise. Data sparseness results from the fact that today's speech technologies heavily rely on statistically based modeling schemes, such as Hidden Markov Models and n-gram language modeling. Although statistical modeling algorithms are mostly language independent and proved to work well for a variety of languages, the parameter estimation requires vast amounts of training data. Large-scale data resources are currently available for less than 80 languages and the costs for these collections are prohibitive to all but the most widely spoken and economically viable languages. The lack of language conventions concerns a surprisingly large number of languages or dialects. The lack of a standardized writing system for example hinders web harvesting of large text corpora or the construction of dictionaries and lexicons. Last but not least, despite the well-defined process of system building it is very cost- and time consuming to handle language-specific peculiarities, and it requires substantial language expertise. Unfortunately, it is extremely difficult to find system developers who simultaneously have the necessary technical background and significant insight into the language in question. Consequently, one of the central issues in developing speech processing systems in many languages is the challenge of bridging the gap between language and technology expertise.
In this tutorial on "Multilingual Speech Processing - Rapid Language Adaptation Tools and Technologies" we will introduce state-of-the-art techniques for rapid language adaptation and will present existing solutions to overcome the ever-existing problem of data sparseness and the gap between language and technology expertise. We will describe in detail the building process for speech recognition and speech synthesis components for new unsupported languages and introduce tools to do this rapidly and at lost costs. The tutorial will consist of several sections covering information ranging from database collection, to model building and system evaluation. Furthermore, the tutorial will include explicit instructions on the following issues:
The tutorial will feature the SPICE Toolkit (Speech Processing - Interactive Creation and Evaluation), a web based toolkit for rapid language adaptation to new languages and RLAT (Rapid Language Adaptation Toolkit), an extension to SPICE for web harvesting and language model evaluation. The methods and tools implemented in SPICE and RLAT will enable the attendees to develop speech processing components, to collect appropriate data for building these models, and to evaluate the results allowing for iterative improvements. Building on existing projects like GlobalPhone and FestVox, knowledge and data are shared between recognition and synthesis; this includes phone sets, pronunciation dictionaries, acoustic models, and text resources. SPICE and RLAT are online services (http://cmuspice.org, http://csl.ira.uka.de/rlat-dev) and the attendees will be able to use these toolkits anytime before and after the tutorial to continue developing their speech processing components. By archiving the data gathered on-the-fly from many cooperative users, we hope to significantly increase the repository of languages and resources and make the data and components for under-supported languages available at large to the community. By keeping the users in the developmental loop, SPICE tools can learn from their expertise to constantly adapt and improve. This will hopefully revolutionize the system development process for new languages.
Alan W Black is an Associate Professor on the faculty of the Language Technologies Institute at Carnegie Mellon University. He is one of the leaders in the area of speech synthesis, having written and distributed many widely used systems and databases including the Festival Speech Synthesis Systems, and the Festvox Voice Building Toolkit (http://festvox.org). Dr Black has published over 140 papers. He has given tutorials on speech synthesis and voice building at NAACL 2001, ASA 2002, Interspeech 2005, ICASSP 2008 and NAACL 2008 as well as many short courses at various summer schools.
This page was last updated on 21-June-2010 3:00 UTC.