| #1 | Influence of lexical tones on intonation in Kammu |
Anastasia Karlsson (Dept of Linguistics and Phonetics, Centre for Languages and Literature, Lund University, Sweden) David House (Dept of Speech, Music and Hearing, School of Computer Science and Communication, KTH, Stockholm, Sweden) Jan-Olof Svantesson (Dept of Linguistics and Phonetics, Centre for Languages and Literature, Lund University, Sweden) Damrong Tayanin (Dept of Linguistics and Phonetics, Centre for Languages and Literature, Lund University, Sweden)
|
| The aim of this study is to investigate how the presence of lexical tones influences the realization of focal accent and sentence intonation. The language studied is Kammu, a language particularly well suited for the study as it has both tonal and non-tonal dialects. The main finding is that lexical tone exerts an influence on both sentence and focal accent in the tonal dialect to such a strong degree that we can postulate a hierarchy where lexical tone is strongest followed by sentence accent, with focal accent exerting the weakest influence on the F0 contour.
|
| #2 | Phonetic Realization of Second Occurrence Focus in Japanese |
Satoshi Nambu (University of Pennsylvania) Yong-cheol Lee (University of Pennsylvania)
|
| Previous studies have recently agreed that second occurrence focus is phonetically realized as prosodic prominence. What has been missing in the previous studies, however, is a comparison with neutral-focus, in addition to main focus and pre/post-focus, which is necessary to elucidate a precise phonetic status of second occurrence focus. Using evidence from Japanese, this study shows that second occurrence focus in the pre/post-focus position is realized with high pitch less salient than main focus but more than pre/post-focus. Compared with neutral-focus, the pitch of second occurrence focus is higher in the pre-focus position but lower in the post-focus position due to post-focus compression. Furthermore, this study provides a cross-linguistic insight of focus realization. The result suggests that Japanese focus experiences pre-focus compression, in addition to post-focus compression, which is different from Korean, English, and Mandarin.
|
| #3 | Prosodic Grouping and Relative Clause Disambiguation in Mandarin |
Jianjing Kuang (UCLA Linguistics)
|
| The study discusses the role of prosodic grouping in the Mandarin Relative Clause attachment disambiguation. The grouping effect is explored under the Implicit Prosody Hypothesis (IPH) from four aspects of sentence processing experiments: default production, contrast production, online processing, as well as offline processing. It is found that (1) the length of RC greatly impacts ambiguity resolutions offline; (2) Prosodic grouping can well reflect the different attachment readings and is consciously used to produce contrastive meanings (3) Online processing can be affected by manipulating the grouping cues: Prominence and pause. The findings support the IPH, and contribute to our understanding about prosodic grouping in Mandarin, which can be applied in spoken language processing.
Index Terms: prosodic grouping, disambiguation, Mandarin
|
| #4 | Text-based Unstressed Syllable Prediction in Mandarin |
Ya Li (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China) Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China) Meng Zhang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China) Shifeng Pan (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China) Xiaoying Xu (Beijing Normal University, Beijing, China)
|
| Recently, an increasing attention has been paid to Mandarin word stress which is important for improving the naturalness of speech synthesis. Most of the research on Mandarin speech synthesis focuses on three stress levels: stressed, regular and unstressed. This paper emphasizes the unstressed syllable prediction because the unstressed syllable is also important to the intelligibility of the synthetic speech. Similar as the prosodic structure, it is not easy to detect the stress from text analysis due to the complicated context information. A method based on Classification and Regression Tree (CART) model has been proposed to predict the unstressed syllables with the high accuracy of 85%. The method has been finally applied into the TTS system. The experiment shows that the MOS score of synthetic speech has been improved by 0.35; the pitch contour of the new synthesized speech is also closer to natural speech.
|
| #5 | “Flat pitch accents” in Czech |
Tomáš Duběda (Institute of Translation Studies, Charles University in Prague)
|
| In this paper we investigate a particular type of stress marking in Czech, in which the syllable perceived as prominent is not accompanied by any clearly audible change in the overall pitch course. The paper gives a perceptual, phonotactic and acoustic account of these “flat pitch accents”. No positional effects or semantic correlates of words bearing this type of accent were found. Flat accents have significantly reduced intonational variability, as expected, and their durational and dynamic correlates are partly different from other accent types. However, none of these findings speaks in favour of compensation between prosodic parameters.
|
| #7 | Positional variability of pitch accents in Czech |
Tomáš Duběda (Institute of Translation Studies, Charles University in Prague)
|
| An analysis of prenuclear accents in read speech is carried out with the aim of finding instances of regularity in their distribution. Significant differences are identified with respect to position within the phrase and phrase length, some of which are correlated with declination and pitch span narrowing. Only a weak interaction is found between nuclear and prenuclear pitch accents. No tendency of using only one type of pitch accents in a phrase could be found. The autosegmental approach seems to be a viable means of analyzing prenuclear intonation in Czech.
|
| #8 | Modeling of Sentence-medial Pauses in Bangla Readout Speech: Occurrence and Duration |
Shyamal Kr Dasmandal (Centre for Development of Advanced Computing, Kolkata) Arup Saha (Centre for Development of Advanced Computing, Kolkata) Tulika Basu (Centre for Development of Advanced Computing, Kolkata) Keikichi Hirose (Department of Information and Communication Engineering, University of Tokyo) Hiroya Fujisaki (Professor Emeritus, University of Tokyo)
|
| Control of pause occurrence and duration is an important issue for text-to-speech synthesis systems. In text-readout speech, pauses occur unconditionally at sentence boundaries and with high probability at major syntactic boundaries such as clause boundaries, but more or less arbitrarily at minor syntactic boundaries. Pause duration tends to be longer at the end of a longer syntactic unit. A detailed analysis is conducted for sentence-medial pauses for readout speech of Bangla. Based on the results, linear models (with variables of syntactic unit length and distance to directly modifying word) are constructed for pause occurrence and duration. The models are evaluated using the test data not included in the analyzed data (open-test condition). The results show that the proposed models can predict occurrence probability for 87% of phrase boundaries correctly, and pause duration within ±100 ms for 80% of the cases.
|
| #9 | Declarative sentence intonation patterns in 8 Swiss German dialects |
Adrian Leemann (Department of Linguistics, University of Bern) Lucy Zuberbuehler (Department of Linguistics, University of Bern)
|
| This study examines declarative sentence intonation contours in 8 vastly different Swiss German dialects by the application of the Command-Response model. Fundamental frequency patterns of a controlled declarative sentence are analyzed on the global and local level of intonation. The results provide evidence of a different patterning for the dialects in the context of how global and local level F0 is modulated. Findings of previous studies on natural Swiss German speech are essentially confirmed, at the same time, however, new trends emerge.
|
| #10 | Syllable-Level Prominence Detection with Acoustic Evidence |
Je Hun Jeon (The University of Texas at Dallas) Yang Liu (The University of Texas at Dallas)
|
| In this work, we conduct a thorough study using acoustic prosodic cues for prominence detection in speech. This study is different from previous work in several aspects. In addition to the widely used prosodic features, such as pitch, energy, and duration, we introduce the use of cepstral features. Furthermore, we evaluate the effect of different features, speaker dependency and variation, different classifiers, and contextual information. Our experiments on the Boston University Radio News Corpus show that although the cepstral features alone do not perform well, when combined with prosodic features they yield some performance gain and, more importantly, can reduce much of the speaker variation in this task. We find that the previous context is more informative than the following context, and their combination achieves the best performance. The final result using selected features with context information is significantly better than that in previous work.
|
| #11 | Prosody Cues For Classification of the Discourse Particle "hã" in Hindi |
Sankalan Prasad (Indian Institute of Technology Kharagpur, Kharagpur, West Bengal 721302, India) Kalika Bali (Microsoft Research Labs India Pvt. Ltd. Sadashivnagar, Bangalore 560080, India)
|
| In Hindi, affirmative particle "hã" carries out a variety of discourse functions. Preliminary investigation has shown that though it is difficult to disambiguate these different functions from prosody alone, there seems to be a distinct prosodic pattern associated with each of these. In this paper, we present a corpus study of spoken utterances of the Hindi word "hã". We identify these prosodic patterns and capture the specific pitch variations associated with each of the various functions. We also examine the use of prosodic cues in classification of the utterances into different functions using k-means clustering. While certain amount of speaker dependency, as well as lack of contextual and lexical information resulted in high classification entropy, however, the results were consistent with comparable studies in other languages.
|
| #12 | Interaction of Syntax-marked Focus and Wh-question Induced focus in Standard Chinese |
Yuan Jia (Phonetics Lab, Institute of Linguistics, Chinese Academy of Social Sciences, China) Aijun Li (Phonetics Lab, Institute of Linguistics, Chinese Academy of Social Sciences, China)
|
| The present study mainly investigates the interaction of syntax-marked focus and wh-question induced focus on the formation of F0 patterns in Standard Chinese (Hereinafter, SC). Acoustic experiment demonstrates that the syntax-marked (lian or shi) focus can co-exist with the wh-question induced focus. The results are two folds: (i) the two kinds of focuses can add together to trigger more obvious F0 prominence on the under-focus constituents and F0 compression on the post-focus constituents; (ii) they can realize prominences simultaneously on difference constituents in one sentence. Therefore, the F0 pattern of SC presents itself to observe the nuclear prominence and pre-nuclear prominence classification as in English. Specifically, the single focus induces the nuclear prominence and the dual focus triggers both nuclear prominence and pre-nuclear prominence.
|
| #13 | Prominence Detection in Swedish Using Syllable Correlates |
Samer Al Moubayed (KTH, Center for Speech Technology, Stockholm, Sweden) Jonas Beskow (KTH, Center for Speech Technology, Stockholm, Sweden)
|
| This paper presents an approach to estimating word level prominence in Swedish using syllable level features. The paper discusses the mismatch problem of annotations between word level perceptual prominence and its acoustic correlates, context, and data scarcity. 200 sentences are annotated by 4 speech experts with prominence on 3 levels. A linear model for feature extraction is proposed on a syllable level features, and weights for these features are optimized to match word level annotations. We show that using syllable level features and estimating weights for the acoustic correlates to minimize the word level estimation error gives better detection accuracy compared to word level features, and that both features exceed the baseline accuracy.
|
| #14 | Automatic analysis of the intonation of a tone language. Applying the Momel algorithm to spontaneous Standard Chinese (Beijing). |
Na Zhi (Laboratorio di Linguistica, Scuola Normale Superiore, Pisa, Italy) Daniel Hirst (Laboratoire Parole et Langage, CNRS \& Universit\'e de Provence, France) Pier Marco Bertinetto (Laboratorio di Linguistica, Scuola Normale Superiore, Pisa, Italy)
|
| This paper describes the application of the Momel algorithm to a corpus of spontaneous speech in Standard (Beijing) Chinese. A selection of utterances by four speakers was analysed automatically and the resynthesised utterances were evaluated subjectively with two categories of errors: lexical tone errors and intonation errors. The target points determining the pitch contours of the synthetic utterances were then corrected manually in order to obtain a set of acceptable utterances for the entire corpus. An application attempting to optimise window-size for the Momel algorithm showed no overall improvement with respect to the manually corrected data. This annotated data will nevertheless constitute a useful yardstick for evaluating improvements to the automatic algorithm which is expected to be far more robust than data annotated for languages with no lexical tone.
|
| #15 | Towards long-range prosodic attribute modeling for language recognition |
Raymond W. M. Ng (Department of Electronic Engineering, The Chinese University of Hong Kong) Cheung-Chi Leung (Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore 138632) Ville Hautamäki (Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore 138632) Tan Lee (The Chinese University of Hong Kong, Hong Kong) Bin Ma (Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore 138632) Haizhou Li (Human Language Technology Department, Institute for Infocomm Research, A*STAR, Singapore 138632)
|
| As a high-level feature, prosody may be an effective feature when it
is modeled over longer ranges than the typical range of a syllable.
This paper is about language recognition with the high-level
prosodic attributes. It studies two important issues of long-range
modeling, namely the data scarcity handling method, and the model
which properly describes prosodic boundary events. Illustrated by
NIST language recognition evaluation (LRE) 2009, long-range modeling
is shown to bring a 7.2% relative improvement to a prosodic
language detector. Score fusion between the long-range prosodic
system and a phonotactic system gives an EER of 3.07%. Exploiting
boundary N-grams is the main contributing factor to global EER
reduction, while different long-range prosodic modeling factors
benefit the detection of different languages. Analysis reveals the
evidence of language-specific long-range prosodic attributes, which
sheds light on robust long-range modeling methods for language
recognition.
|
| #16 | A Modified Parameterization of the Fujisaki Model |
Robert Schubert (Dresden University of Technology, Institute of Acoustics and Speech Communication) Oliver Jokisch (Dresden University of Technology, Institute of Acoustics and Speech Communication) Diane Hirschfeld (voice INTER connect GmbH)
|
| Fujisaki’s command-response model has proven suitable for
analysis and synthesis of intonation contours in several languages.
Although widely used in synthesis, it is subject to
certain limitations, including mathematical over-determinacy,
and insufficiency for some naturally occurring forms. We propose
an alternative parameterization which separates declination
and phrasal height, thereby making mathematical properties
of phrase control symmetric to accent control. The
modification improves the model’s utility for analysis, predictive
synthesis, and rule-based synthesis, esp. when command dependent
attenuation factors are used. An evaluation of the
modified F0 generation on a speech corpus, based on experiments
with the DRESS synthesizer, shows lower RMSE values
and similar correlations between natural contours and their synthesized
counterparts.
|