| Time: | Wednesday 13:30 | Place: | 302 | Type: | Oral |
| Chair: | Julia Hirschberg | ||||
| 13:30 | Voice Attributes Affecting Likability Perception |
| (Quality & Usability Lab, DT. Laboratories, TU Berlin) (Deutsche Telekom Laboratories) | |
| Ratings of voices' likability were collected in two subsequent studies. A single scale seems to be sufficient for assessing such ratings. Based on limited but controlled data, spectral parameters as well as f0 and articulation rate correlate with the ratings obtained. An automatic classification confirms the relevance of spectral features for the perception of likability. As a simple method of collecting more data for further studies, the single scale was validated within the bounds of the small data set. Both, the spectral parameters and items from a comprehensive questionnaire indicate the relevance of timbre for the likability perception. | |
| 13:50 | Turn alignment using eye-gaze and speech in conversational interaction |
| (University of Helsinki) (Doshisha University) (Doshisha University) (Doshisha University) | |
| Spoken interactions are known for accurate timing and alignment between interlocutors: turn-taking and topic flow are managed in a manner that provides conversational fluency and smooth progress of the task. This paper studies the relation between the interlocutors’ eye-gaze and spoken utterances, and describes our experiments on turn alignment. We conducted classification experiments by Support Vector Machine on turn-taking using the features for dialogue act, eye-gaze, and speech prosody in conversation data. As a result, we demonstrated that eye-gaze features are important signals in turn management, and seem even more important than speech features when the intention of utterances is clear. | |
| 14:10 | An Investigation of Formant Frequencies for Cognitive Load Classification |
| (School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, Australia) (School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, Australia) (School of Electrical Engineering and Telecommunications, The University of New South Wales, Sydney, Australia) (ATP Research Laboratory, National ICT Australia, Australia) | |
| The cognitive load experienced by a person can be used as an index to monitor task performance. Hence, the ability to measure the cognitive load of a person using speech can potentially be very useful, especially in areas such as air traffic control systems. Current research on cognitive load does not provide enough insight into how cognitive load affects the speech spectrum, or the speech production system. Since formants are closely related to the underlying vocal tract configuration, this work aims to study the effect of cognitive load on vowel formant frequencies, and hence proposes the effective application of formant features to cognitive load classification. Results from classification performed on the Stroop test database show that formant features not only have lower dimensionality, but dynamic formant features can outperform conventionally used MFCC-based features by a relative improvement of 12%. | |
| 14:30 | Language specific effects of emotion on phoneme duration |
| (Tilburg University) (Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands) | |
| This paper presents an analysis of phoneme durations of emotional speech in two languages: Dutch and Korean. The analyzed corpus of emotional speech has been specifically developed for the purpose of cross-linguistic comparison, and is more balanced than any similar corpus available so far: a) it contains expressions by both Dutch and Korean actors and is based on judgments by both Dutch and Korean listeners; b) the same elicitation technique and recording procedure were used for recordings of both languages; and c) the phonetics of the carrier phrase were constructed to be permissible in both languages. The carefully controlled phonetic content of the carrier phrase allows for analysis of the role of specific phonetic features, such as phoneme duration, in emotional expression in Dutch and Korean. In this study the mutual effect of language and emotion on phoneme duration is presented. | |
| 14:50 | Automatic Classification of Married Couples' Behavior using Audio Features |
| (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) (Department of Psychology, University of Southern California, Los Angeles, CA, USA) (Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA) (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) (Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, CA, USA) | |
| In this work, we analyzed a 96-hour corpus of married couples spontaneously interacting about a problem in their relationship. Each spouse was manually coded with relevant session-level perceptual observations (e.g., level of blame toward other spouse, global positive affect), and our goal was to classify the spouses' behavior using features derived from the audio signal. Based on automatic segmentation, we extracted prosodic/spectral features to capture global acoustic properties for each spouse. We then trained gender-specific classifiers to predict the behavior of each spouse for six codes. We compare performance for the various factors (across codes, gender, classifier type, and feature type) and discuss future work for this novel and challenging corpus. | |
| 15:10 | Influence of Gestural Salience on the Interpretation of Spoken Requests |
| (Monash University) (Monash University) (Monash University) | |
| We present a probabilistic, salience-based mechanism for the interpretation of pointing gestures together with spoken utterances. Our formulation models dependencies between spatial and temporal aspects of gestures and features of objects. The results from our corpus-based evaluation show that the incorporation of pointing information improves interpretation accuracy. |