The INTERSPEECH 2010 Organizing Committee are pleased to announce the acceptance of the following distinguished speakers to give keynote talks at the conference.
Monday, September 27:
2010 ISCA Medalist - Steve Young, University of Cambridge
received a BA in Electrical Sciences from Cambridge University in 1973 and a PhD in Speech Processing in 1978. He held lectureships at both Manchester and Cambridge Universities before being elected to the Chair of Information Engineering at Cambridge University in 1994. He was a co-founder and Technical Director of Entropic Ltd from 1995 until 1999 when the company was taken over by Microsoft. After a short period as an Architect at Microsoft, he returned full-time to the University in January 2001 where he is now Senior Pro-Vice-Chancellor.
His research interests include speech recognition, language modelling, spoken dialogue and multi-media applications. He is the inventor and original author of the HTK Toolkit for building hidden Markov model-based recognition systems (see http://htk.eng.cam.ac.uk), and with Phil Woodland, he developed the HTK large vocabulary speech recognition system which has figured strongly in DARPA/NIST evaluations since it was first introduced in the early nineties. More recently he has developed statistical dialogue systems and pioneered the use of Partially Observable Markov Decision Processes for modelling them. He also has active research in voice transformation, emotion generation and HMM synthesis.
He has written and edited books on software engineering and speech processing, and he has published as author and co-author, more than 200 papers in these areas. He is a Fellow of the Royal Academy of Engineering, the IEEE, the IET and the Royal Society of Arts. He served as the senior editor of Computer Speech and Language from 1993 to 2004 and he is currently Chair of the IEEE Speech and Language Processing Technical Committee. In 2004, he received an IEEE Signal Processing Society Technical Achievement Award.
Still Talking to Machines (Cognitively Speaking)
At Interspeech 2002 in Denver, I suggested that it should be possible to build a complete spoken dialogue system in which every component was based on a statistical model with parameters estimated from data. The potential advantages of such a system would include lower development cost, increased robustness to noise and the ability to learn on-line so that performance would continue to improve over time.
Eight years later, fully statistical systems have now been built in the laboratory and their potential demonstrated. This talk will review the basic principles of statistical dialogue systems and discuss the major lessons learnt so far. The focus will be on dialogue management and in particular the representation of dialogue state, approaches to belief monitoring, parameter estimation and policy optimisation. Probabilistic components for speech understanding, natural language generation and synthesis will also be covered. The talk will end with a discussion of future challenges and the direction ahead.
Tuesday, September 28:
Tohru Ifukube, Research Center for Advanced Science and Technology, The University of Tokyo
is a project professor and a professor emeritus at the University of Tokyo as well as at Hokkaido University. He received the MS and Dr. Eng degrees in electronics from Hokkaido University. He was an Assistant and Associate Professor of Medical Electronics at Hokkaido University from 1971 to 1988. He was also a visiting Associate Professor of Cochlear Implant Project at Stanford University in 1984. He was a professor of Medical Electronics and Sensory Information Engineering at Hokkaido University from 1989 to 2002 and he was a professor of Barrier-Free Project at the University of Tokyo from 2002 to 2009. His research interests include analysis of hearing, speaking and visual functions, and design of assistive tools for the disabled. Some of the tools have been used for the blind, the deaf and the speech disordered, and also have been applied to virtual reality and robotic systems. He published "Design of Voice Typewriter" (1983), "Sound-based Assistive Technology" (1997), "Evaluation of Virtual Reality" (Editor, 2000), and "Challenges of Assistive Technology" (2004) in Japanese. He is a fellow of the Institute of Electronics, Information and Communication Engineers of Japan (IEICE) and received some grand prizes such as "Designing Products in Japan" and "Japan Good Design".
Sound-based Assistive Technology Supporting "Seeing", "Hearing" and "Speaking" for the Disabled and the Elderly
With a rapid increase of a rate of the elderly, disabled people also have been increasing in Japan. Over a period of 40 years, I have developed a basic research approach of assistive technology, especially for people with seeing, hearing, and speaking disorder. Although some of the required tools have been practically used for the disabled in Japan, I have experienced how insufficient a function of the tools is for supporting the sensory and communication disorders. Moreover, I have been impressed by how amazingly potential ability of the human brain works in order to compensate the disorders.
In my key note speech, I will show some compensation abilities formed by "brain plasticity", and also show extraordinary ability of some animals such as voice imitation of mynah bird and echolocation of bats. Furthermore, I will introduce six assistive tools borne by solving mystery of the compensation function and the extraordinary animals. Finally, I will emphasize that these assistive tools will contribute to design a new human interface for robots that may support the elderly as well as the disabled.
Wednesday, September 29:
Chiu-yu Tseng, Institute of Linguistics, Academia Sinica, Taiwan
is a Research Fellow at the Institute of Linguistics, Academia Sinica, Taiwan. Trained as a phonetician (Ph.D. in Linguistics, Brown University, 1981), her collaboration with speech scientists and engineers dates back to 1982, which has led her away from studying limited samples and numbers of speakers toward multiple speakers, larger chunks of more realistic speech, and larger quantities of data (though modest by speech technology community standards). Her research has integrated techniques from engineering and speech technology into acoustic phonetic experimental studies. Her twelve-year investigation of Mandarin Chinese fluent speech prosody from a macro/top-down perspective, taking intonation units larger than the phrase or sentence into consideration, has resulted in the emergence of what she believes to be the defining feature of fluent speech prosody: systematic cross-phrase prosodic association, which constitutes prosodic context. This approach contrasts with analyses of discourse intonation based on patterns of individual phrase intonation. Using quantitative evidence, she has developed a hierarchical prosodic framework, which models the formation of spoken discourse prosody as the accumulation of multi-layered prosodic contributions. She has also been able to tease apart the contributions to cross-phrase prosodic association made by each layer of the prosodic hierarchy for a range of acoustic parameters for which, interestingly, the contributions made by supra-segmental acoustic correlates have been found to vary. As of 2008, she has also begun phonetic comparisons of L1 and L2 English (with a focus on prosody) as a member of AESOP (Asian English Speech cOrpus Project).
Beyond Sentence Prosody
The prosody of a sentence (utterance) when it appears in a discourse context differs substantially from when it is uttered in isolation. This talk focuses on why global prosody is an intrinsic part of naturally occurring speech. That is to say, prosodic chunking and phrasing occur not only at the sentence level, but also at the discourse level. Read and spontaneous L1 Mandarin speech data, as well as L1 and L2 English data, will be presented to illustrate our proposal that higher-level discourse information takes syntax, phonology and lexicon as sub-level units, and hierarchical contributions add higher units to lower ones to derive multi-phrase global prosody. Traces of global prosody found in lower-level speech units are abundant in the speech signal; their seemingly random occurrences can, in fact, be systematically derived. In the pitch domain, we will show evidence of down-stepping both within and across phrase boundaries, explain why phrasal F0 resets are not uniform, and why some overall F0 trajectories flatten out. In the temporal domain, we will show how speaking rate is adjusted mostly by and across phrases, rather than words, why sometimes pause duration does not occur between boundaries and is thus not the most reliable boundary cue, and why both pre-boundary (phrase-final) lengthening and shortening are found consistently. Furthermore, examination of units larger than the sentence has revealed why prosodic context exhibits both neighborhood linear adjacency and cross-over associative concurrence, and why phrasal prominence must yield to discourse focus. It is argued here that the sentence is not the ultimate unit of speech planning, and that global prosody must take precedence because it more accurately reflects the size and scale of speech planning. The planning itself is highly flexible, however, as our L1 and L2 speech data reveal. In summary, to better understand and model realistic speech, looking from the sentence level up or looking top-down from higher levels of prosodic organization may produce the most interesting results.