| Time: | Wednesday 13:30 | Place: | 201A | Type: | Oral |
| Chair: | Norihide Kitaoka | ||||
| 13:30 | Say What? Why users choose to speak their web queries |
| (Google) (Google) | |
| The context in which a speech-driven application is used (or conversely not used) can be an important signal for recognition engines, and for spoken interface design. Using large-scale logs from a widely deployed spoken system, we analyze on an aggregate level factors that are correlated with a decision to speak a query rather than type it. We find the factors most predictive of spoken queries are whether a query is made from an unconventional keyboard, for a search topic relating to a users' location, or for a search topic that can be answered in a “hands-free” fashion. We also find, contrary to our intuition, that longer queries have a higher probability of being typed than shorter queries. | |
| 13:50 | The Effect of Audience Familiarity on the Perception of Modified Accent |
| (Teesside University) (University of Auckland) | |
| Evaluating the efficacy of accent transformation is important when localising speech-enabled software. However, perceived accent is an attribute assigned by a listener, and the apparent success of accent transformation will vary with the audience. Here we show the extent to which evaluations can be affected by audience familiarity with an accent. A perceptual study comparing two approaches to accent transformation is presented to two audiences with differing familiarity with the target accents. For mean opinion score style evaluations, we quantify the approximate change in perception, and show that this can be sufficient to alter relative successfulness of such systems. | |
| 14:10 | On Generating Combilex Pronunciations via Morphological Analysis |
| (Centre for Speech Technology Research, Edinburgh University) (Centre for Speech Technology Research) (Centre for Speech Technology Research) | |
| Combilex is a high-quality lexicon that has been developed specifically for speech technology purposes and recently released by CSTR. Combilex benefits from many advanced features. This paper explores one of these: the ability to generate fully-specified transcriptions for morphologically derived words automatically. This functionality was originally implemented to encode the pronunciations of derived words in terms of their constituent morphemes, thus accelerating lexicon development and ensuring a high level of consistency. In this paper, we propose this method of modelling pronunciations can be exploited further by combining it with a morphological parser, thus yielding a method to generate full transcriptions for unknown derived words. Not only could this accelerate adding new derived words to Combilex, but it could also serve as an alternative to conventional letter-to-sound rules. This paper presents preliminary work indicating this is a promising direction. | |
| 14:30 | Say It As You Mean It – Analyzing Free User Comments in the VOICE Awards Corpus |
| (Quality and Usability Lab, Deutsche Telekom Labs, Technische Universität Berlin) (Quality and Usability Lab, Deutsche Telekom Labs, Technische Universität Berlin) | |
| Usability questionnaires usually contain scales related to effectiveness, efficiency and overall satisfaction which provide a quantitative value for the user’s opinion. However, analyzing quantitative data often does not show the reason underlying for a good or bad opinion. Simple questions like “What did you like about the system?” and “What did you not like about the system?” can shade light on the underlying reasons, but a lot of effort is needed for the analysis of such data. Nevertheless, the answers to these questions contain the users’ opinion in their own words and hence often show high correlation with the overall rating of the system. In the frame of the SpeechEval project we analyzed the German VOICE Awards corpus over three consecutive years, categorizing the answers to these two free text questions and analyzing correlations between the categories and the overall rating of the systems. | |
| 14:50 | A new multichannel multimodal dyadic interaction database |
| (USC) (USC) (USC) (USC) (USC) (USC) | |
| In this work we present a new multi-modal database for analysis of participant behaviors in dyadic interactions. This database contains multiple channels with close- and far-field audio, a high definition camera array and motion capture data. Presence of the motion capture allows precise analysis of the body language low-level descriptors and its comparison with similar descriptors derived from video data. Data is manually labeled by multiple human annotators using psychology-informed guides. This work also presents an initial analysis of approach-avoidance (A-A) behavior. Two sets of annotations are provided, one based on video only and the other obtained by using both the audio and video channels. Additionally, we describe the statistics of interaction descriptors and A-A labels on participants' roles. Finally we provide an analysis of relations between various non-verbal features and approach/avoidance labels. | |
| 15:10 | SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia |
| (School of Computer Engineering, Nanyang Technological University, Singapore) (School of Computer Sciences, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia) (School of Computer Engineering, Nanyang Technological University, Singapore 639798) (Institute for Infocomm Research, 1 Fusionopolis Way, Singapore 138632) | |
| In Singapore and Malaysia, people often speak a mix of Mandarin and English with a single sentence, that we call intra-sentential code-switch sentence. In this paper, we report the development of a Mandarin-English code-switching spontaneous speech corpus: SEAME. As part of a multilingual speech recognition project, the design of such a corpus allows the study of how Mandarin-English code-switch speech occurs in the spoken language in South-East Asia, and provides insights into the development of large vocabulary continuous speech recognition (LVCSR) to cover code-switching speech. We develop a speech corpus of intra-sentential code-switching utterances that are recorded under both interview and conversational settings. The paper describes the corpus design and the analysis of collected corpus. |