| Time: | Thursday 13:30 | Place: | International Conference Room C | Type: | Poster |
| Chair: | Mikio Nakano | ||||
| #1 | Combining text categorization and dialog modeling for speaker role identification on call center conversations |
| (LIA, EDF) (EDF) (LIA) (LIA) | |
| In this paper, we address the problem of speaker role identification on a corpus of manually transcribed call center conversations. We first tackle it as a text categorization task. Then, we combine these categorization results with a dialog modeling approach. We achieve 93% of correct role assignment with the least method. Our method also offers the possibility to extract text spans specific to each role. These strings slightly improve the role identification results and are an interesting element for conversation analysis. | |
| #2 | Topic-dependent N-gram models based on Optimization of Context Lengths in LDA |
| (SANYO Electric Co., Ltd.) (Gifu University) | |
| This paper describes a method that improves accuracy of N-gram language models which can be applied to on-line applications. The precision of a long-distance language model including LDA is influenced by a context length, or a length of the history used for prediction. In the proposed method, each of multiple LDA units estimates an optimum context length separately, then those predictions are integrated and N-gram probabilities are calculated. The method directly estimates the optimum context length suitable for prediction. Results show the method improves topic-dependent N-gram probabilities, particularly of a word related to specific topics, yielding higher and more stable performance comparing to an existing method. | |
| #3 | Expectations for Discourse Genre Identification: a Prosodic Study |
| (IRCAM) (University College London) (University of Paris Ouest - La Défense) (IRCAM) | |
| Speech can be divided into discourse genres based on the contextual environment it occurs in (e.g. political speech, sport commentary speech, etc.). The present study investigated whether listeners can distinguish between speech from different discourse genres on the basis of acoustic prosodic cues only. In a perception experiment with delexicalized speech 70 listeners with varying experience in French (native speakers, non-native speakers, and non-speakers) were asked to identify four different types of discourse genres (church service, political, journal, and sport commentary). Results revealed a fair identification ability with a significant increase in performance with increasing experience in French. Identification confusion was used to cluster discourse genres according to their perceptual similarity. The possible application of the results for the evaluation of speaking style speech synthesis will be discussed. | |
| #4 | Dialogue Act Tagging and Segmentation with a Single Perceptron |
| (University of Oxford) (University of Oxford) (Universidad Politécnica de Valencia) (Universidad Politécnica de Valencia) | |
| In this paper we present a simultaneous automatic Dialogue Act (DA) tagger and segmenter. The model employed is based on the well-known single layer perceptron algorithm used successfully in other Computational Linguistic tasks. A decoding process was developed for searching the sequence of segments and DA tags from all the possible exponential possibilities. A set of features based on combination of words and DA tags were empirically selected. Models were tested over transcriptions of two corpora of dialogues (Switchboard and Dihana) and transcriptions and ASR output of a third corpus composed by meetings (AMI corpus). The results obtained for such a simple but powerful model are for some of the evaluation metrics equal or better than much more complex models presented in recent studies for the same experiments. | |
| #5 | Improving the Readability of Class Lecture ASR Results using a Confusion Network |
| (Toyohashi University of Technology, Japan) (Toyohashi University of Technology, Japan) (Toyohashi University of Technology, Japan) | |
| This paper presents a method for improving the readability of Automatic Speech Recognition (ASR) results for classroom lectures. Most of the previous research on improving the readability of recognition results focused mainly on manually transcribed texts, and not ASR results. Due to the presence of a large number of domain-dependent words and the casual presentation style, even state-of-the-art recognizers yield a 30-50% word error rate for speech in classroom lectures. Thus, a method for improving the readability of ASR results needs to be robust against recognition errors. In this paper, we propose a novel method for improving the readability based on a machine translation model that uses a confusion network representing multiple hypotheses of the ASR results to achieve robustness against recognition errors. Experimental results show that the proposed method outperforms the baselines in both automatic and manual evaluations. |