| Time: | Thursday 10:00 | Place: | 301 | Type: | Special |
| Chair: | Khiet Truong & Dirk Heylen | ||||
| 10:00 | Detecting politeness and efficiency in a cooperative social interaction |
| (Queen's University Belfast) (Deutsche Forschungszentrum für Künstliche Intelligenz) (Queen's University Belfast) (Deutsche Forschungszentrum für Künstliche Intelligenz) (Queen's University Belfast) (Queen's University Belfast) | |
| We developed a cooperative time-sensitive task to study vocal expression of politeness and efficiency. Sixteen dyads completed 20 trials of the ‘Maze Task’, where one participant (the ‘navigator’) gave oral instructions (mainly ‘up’, ‘down’, left’, ‘right’) for the other (the ‘pilot’) to follow. For half of the trials, navigators were instructed to be polite, and for the other half to be efficient. The simplicity of the task left few ways to express politeness. Nevertheless it significantly affected task accuracy, and pilots’ subjective ratings indicate that it was perceived. Efficiency was not as clearly perceived. Preliminary acoustic analysis suggests relevant dimensions. | |
| 10:20 | Comparing Measures of Synchrony and Alignment in Dialogue Speech Timing with respect to Turn-taking Activity |
| (Trinity College Dublin) (Ulm University) | |
| This paper describes a system for predicting discourse-role features based on voice-activity detection. It takes as input a vector of values extracted from conversational speech and predicts turn-taking activity and active-listening patterns using an echo-state network. We observed evidence of frame-attunement using a measure of speech density which takes the ratio of speech to non-speech behaviour per utterance. We noted a synchrony of utterance timing and modelled this using the ESN. The system was trained on a subset of data from 100 telephone conversations from the 1,500-hour JST Expressive Speech Processing corpus, and predicts the interlocutor's timing behaviour with an error-rate of less than 15% based on one partner's speech-activity alone. An integrated system with access to content information would of course perform at higher rates. | |
| 10:40 | Resources for turn competition in overlap in multi-party conversations: Speech rate, pausing and duration |
| (University of Sheffield, Departments of Computer Science and Human Communication Sciences) (University of Sheffield, Department of Computer Science) (University of Sheffield, Department of Human Communication Sciences) | |
| This paper investigates the prosodic features that speakers use to compete for the turn when they talk simultaneously. Most previous research has focused on F0 and energy variation as resources for turn competition; here, we investigate the relevance of speech rate, pausing and the duration of in-overlap talk. These features are extracted from a set of overlaps drawn from the ICSI Meetings Corpus, and used to derive decision trees that classify overlapping talk as competitive or non-competitive. The decision trees show that both pausing and the duration of the in-overlap speech are significantly related to turn competition for both overlappers and overlappees. Additionally, speech rate is used by overlappees to return competition upon a turn competitive incoming. These findings partially support and extend the observations made in previous studies within the framework of conversation analysis and interactional phonetics. | |
| 11:00 | Disambiguating the functions of conversational sounds with prosody: the case of `yeah' |
| (University of Twente) (University of Twente) | |
| In this paper, we look at how prosody can be used to automatically distinguish between different dialogue act functions and how it determines degree of speaker incipiency. We focus on the different uses of `yeah'. Firstly, we investigate ambiguous dialogue act functions of `yeah': `yeah' is most frequently used as a backchannel or an assessment. Secondly, we look at the degree of speakership incipiency of `yeah': some `yeah' items display a greater intent of the speaker to take the floor. Classification experiments with decision trees were performed to assess the role of prosody: we found that prosody indeed plays a role in disambiguating dialogue act functions and in determining degree of speaker incipiency of `yeah'. | |
| 11:20 | Prosody and voice quality of vocal social signals: the case of dominance in scenario meetings |
| (DFKI) (DFKI) (DFKI) | |
| In this paper we investigate the prosody and voice quality of dominance in scenario meetings. We have found that in these scenarios the most dominant person tends to speak with a louder-than-average voice quality and the least dominant person with a softer-than-average voice quality. We also found that the most dominant role in the meetings is the project manager and the least dominant the marketing expert. A set of raw and composite measures of prosody and voice quality are extracted from the meeting data followed by a Principal Components Analysis (PCA) to identify the core factors predicting the associated social signal or related annotation. | |
| 11:40 | The Prosody of Swedish Conversational Grunts |
| (CTT, TMH, CSC, KTH) (CTT, TMH, CSC, KTH) | |
| This paper explores conversational grunts in a face-to-face setting. The study investigates the prosody and turn-taking effect of fillers and feedback tokens that has been annotated for attitudes. The grunts were selected from the DEAL corpus and automatically annotated for their turn taking effect. A novel supra-segmental prosodic signal representation and contextual timing features are used for classification and visualization. Classification results using linear discriminant analysis, show that turn-initial feedback tokens lose some of their attitude-signaling prosodic cues compared to non-overlapping continuer feedback tokens. Turn taking effects can be predicted well over chance level, except Simultaneous Starts. However, feedback tokens before places where both speakers take the turn were more similar to feedback continuers than to turn initial feedback tokens. |