Meeting recognition has been referred to as an "ASR complete" problem, with challenges arising from microphone array based audio capture, highly overlapped multiparty conversational speech, important non-lexical information relating to social interactions, as well as a wide range of speech understanding challenges. This will be reflected in the tutorial which will cover meeting capture and annotation, speech processing, representation and transfer of information, presentation and user interfacing. Obtaining high quality recordings is a non-trivial task which requires careful planning on desired outcomes and quality both in recognition and classification. Key events and their annotation is crucial for conducting research. Many types of annotation are desirable but good annotation quality is hard to achieve. The processing of the speech signals is crucial as the main source of content. Far field microphone array based recognition, diarisation, automatic speech recognition, and disfluency filtering are they main aspects here, alongside online aspects in these areas. Compact representation of content for visualisation is vital for applications such as off-line browsing and search for specific content. Summarisation and content linking (e.g. to slides presented) allow transfer of information to remote meeting participants. Finally, how to present the wealth of information to a remote meeting participant is of crucial importance, even more so for remote participants.
Within the EU Integrated Projects AMI and AMIDA (www.amiproject.org) we have worked on recording, annotation, recognition and classification, presentation and interpretation of meeting data as well as application demonstrators. The outcome of these projects will serve as a strong foundation and source of demonstration and examples for this tutorial.
The objective of this tutorial is to present a good overview of the research topics associated with meeting processing, the state-of-the-art in recording and processing technologies involved, as well as successful application scenarios. We will especially focus on issues arising from bringing a wide range of subjects together in single targeted applications. In particular we want to highlight the value of observation of complex communication scenarios and the wealth of information obtainable from work in real world scenarios.
Steve Renals is director of the Centre for Speech Technology Research
CSTR) and professor of Speech Technology in the School of
Informatics, at the University of Edinburgh. He received a BSc in
Chemistry from the University of Sheffield in 1986, an MSc in
Artificial Intelligence from the University of Edinburgh in 1987, and
a PhD in Speech Recognition and Neural Networks, also from Edinburgh,
in 1990. From 1991-92 he was a postdoctoral fellow at the
International Computer Science Institute (ICSI), Berkeley, and was
then an EPSRC postdoctoral fellow in Information Engineering at the
University of Cambridge (1992-94). From 1994-2003 he was a lecturer,
then reader, in Computer Science at the University of Sheffield,
moving to Edinburgh in 2003. He is an associate editor of ACM
Transactions on Speech and Language Processing and IEEE Signal
Processing Letters, a former member of the IEEE Technical Committee on
Machine Learning and Signal Processing, and a member of the ICMI-MLMI
Renals has been working on meeting recognition since 2002, and jointly coordinated the European M4, AMI and AMIDA projects, which focused on meeting recognition. He has research interests in speech recognition, statistical language processing and multimodal interaction, over 150 refereed publications in these areas.
This page was last updated on 21-June-2010 3:00 UTC.