Thu-Ses2-P3:
Discourse and Dialogue

This is the final program for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself.
Time:Thursday 13:30 Place:International Conference Room C Type:Poster
Chair:Mikio Nakano
#1Combining text categorization and dialog modeling for speaker role identification on call center conversations
Remi Lavalley (LIA, EDF)
Chloe Clavel (EDF)
Patrice Bellot (LIA)
Marc El-Beze (LIA)
In this paper, we address the problem of speaker role identification on a corpus of manually transcribed call center conversations. We first tackle it as a text categorization task. Then, we combine these categorization results with a dialog modeling approach. We achieve 93% of correct role assignment with the least method. Our method also offers the possibility to extract text spans specific to each role. These strings slightly improve the role identification results and are an interesting element for conversation analysis.
#2Topic-dependent N-gram models based on Optimization of Context Lengths in LDA
Akira Nakamura (SANYO Electric Co., Ltd.)
Satoru Hayamizu (Gifu University)
This paper describes a method that improves accuracy of N-gram language models which can be applied to on-line applications. The precision of a long-distance language model including LDA is influenced by a context length, or a length of the history used for prediction. In the proposed method, each of multiple LDA units estimates an optimum context length separately, then those predictions are integrated and N-gram probabilities are calculated. The method directly estimates the optimum context length suitable for prediction. Results show the method improves topic-dependent N-gram probabilities, particularly of a word related to specific topics, yielding higher and more stable performance comparing to an existing method.
#3Expectations for Discourse Genre Identification: a Prosodic Study
Nicolas Obin (IRCAM)
Volker Dellwo (University College London)
Anne Lacheret (University of Paris Ouest - La Défense)
Xavier Rodet (IRCAM)
Speech can be divided into discourse genres based on the contextual environment it occurs in (e.g. political speech, sport commentary speech, etc.). The present study investigated whether listeners can distinguish between speech from different discourse genres on the basis of acoustic prosodic cues only. In a perception experiment with delexicalized speech 70 listeners with varying experience in French (native speakers, non-native speakers, and non-speakers) were asked to identify four different types of discourse genres (church service, political, journal, and sport commentary). Results revealed a fair identification ability with a significant increase in performance with increasing experience in French. Identification confusion was used to cluster discourse genres according to their perceptual similarity. The possible application of the results for the evaluation of speaking style speech synthesis will be discussed.
#4Dialogue Act Tagging and Segmentation with a Single Perceptron
Ramon Granell (University of Oxford)
Stephen Pulman (University of Oxford)
Carlos-D. Martínez-Hinarejos (Universidad Politécnica de Valencia)
José Miguel Benedí (Universidad Politécnica de Valencia)
In this paper we present a simultaneous automatic Dialogue Act (DA) tagger and segmenter. The model employed is based on the well-known single layer perceptron algorithm used successfully in other Computational Linguistic tasks. A decoding process was developed for searching the sequence of segments and DA tags from all the possible exponential possibilities. A set of features based on combination of words and DA tags were empirically selected. Models were tested over transcriptions of two corpora of dialogues (Switchboard and Dihana) and transcriptions and ASR output of a third corpus composed by meetings (AMI corpus). The results obtained for such a simple but powerful model are for some of the evaluation metrics equal or better than much more complex models presented in recent studies for the same experiments.
#5Improving the Readability of Class Lecture ASR Results using a Confusion Network
Yasuhisa Fujii (Toyohashi University of Technology, Japan)
Kazumasa Yamamoto (Toyohashi University of Technology, Japan)
Seiichi Nakagawa (Toyohashi University of Technology, Japan)
This paper presents a method for improving the readability of Automatic Speech Recognition (ASR) results for classroom lectures. Most of the previous research on improving the readability of recognition results focused mainly on manually transcribed texts, and not ASR results. Due to the presence of a large number of domain-dependent words and the casual presentation style, even state-of-the-art recognizers yield a 30-50% word error rate for speech in classroom lectures. Thus, a method for improving the readability of ASR results needs to be robust against recognition errors. In this paper, we propose a novel method for improving the readability based on a machine translation model that uses a confusion network representing multiple hypotheses of the ASR results to achieve robustness against recognition errors. Experimental results show that the proposed method outperforms the baselines in both automatic and manual evaluations.

top