| Time: | Tuesday 10:00 | Place: | International Conference Room C | Type: | Poster |
| Chair: | G Ananthakrishnan | ||||
| #1 | Speaking style dependency of formant targets |
| (Oregon Health & Science University) (Oregon Health & Science University) (Oregon Health & Science University) | |
| Previous work on formant targets has assumed that these targets are independent of the speaking style. In this paper, we estimate consonant and vowel targets in a database of “clear” and “conversational” speech, using both style-independent and style-dependent models. The test-set errors and clustering of the estimated target values indicate that for this corpus, formant targets depend on the speaking style. As an application, the vowel classification accuracy was tested with both style-indepently and dependently based on observed formant values and estimated target values. Token-based style-independent classification shows greater accuracy for conversational speech (82.19%) than observed-value classification (73.97%). | |
| #2 | Similarity of effects of emotions on the speech organ configuration with and without speaking |
| (Konan University) | |
| In this work we propose and verify a hypothesis on emotional speech production: emotions induce physical and physiological changes in the whole body including the speech organs, regardless of whether or not the person is speaking, and as a side effect, this changes the voice quality. To verify this hypothesis, we measured the speech organ configuration of actors simulating four emotions (neutral, hot anger, joy, and sadness) with and without speaking by MRI. The results showed that emotions affect the speech organ configuration, and the same tendency of changes was found regardless of whether or not the person was speaking. | |
| #3 | A Study of Intra-Speaker and Inter-Speaker Affective Variability using Electroglottograph and Inverse Filtered Glottal Waveforms |
| (Viterbi School of Engineering, University of Southern California, CA, USA) (Viterbi School of Engineering, University of Southern California, CA, USA) (Department of Linguistics, University of Southern California, CA, USA) (Viterbi School of Engineering, University of Southern California, CA, USA) | |
| It is well-known that different speakers utilize their vocal instruments in diverse ways to express linguistic intention with some paralinguistic coloring such as emotional quality. The study of voice source features, which describe the action of the vocal folds, is important for a deeper understanding of emotion encoding in speech. In this study we investigate inter and intra-speaker differences in voicing activities as a function of emotion using electroglottography (EGG) and inverse filtering technique. Results demonstrate that while voice quality features are good indicators of affective state, voice source descriptors vary in affective information across speakers. Glottal ratio measurements taken directly from the EGG signal are more reliable than measurements from the inverse-filtered glottal airflow signal, but the spectral harmonic amplitude differences of EGG are less useful than from inverse filtering. | |
| #4 | Modal analysis of vocal fold vibrations using laryngotopography |
| (Department of Communication Disorders, Health Sciences University of Hokkaido) (Department of Otolaryngoloty, University of Tokyo) (Department of Otolaryngoloty, University of Tokyo) (Department of Otolaryngoloty, University of Tokyo) (Department of Otolaryngology, Head and Neck Surgery, National Center for Glogbal Health and Medicine) | |
| In this paper, we propose a method for analyzing spatial characteristics of the larynx during phonation by high-speed digital imaging. The laryngotopography was applied to the high-speed digital images of normal subjects, and patients with paralysis and cyst. The results show various modes of vibration of the vocal folds particular to the patients with paralysis and cyst and usefulness of the laryngotopograph for clinical purposes. | |
| #5 | Laryngeal Voice Quality in the Expression of Focus |
| (Universty of Helsinki, Institute of Behavioural Sciences) (Nokia Corp.) (Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands) (Department of Signal Processing and Acoustics, Aalto University, Finland) | |
| Prominence relations in speech are signaled by various ways including such phonetic means as voice fundamental frequency, intensity, and duration. A less studied acoustic feature affecting prominence is the so called voice quality which is determined by changes in the airflow caused by different laryngeal settings. We investigated the changes in voice quality with respect to linguistic prosodic signaling of focus in simple three word utterances. We used inverse filtering based methods for calculating and parametrizing the glottal flow in several different vowels and focus conditions. The results supported our hypothesis -- formed by an earlier study of voice quality changes in running speech -- that more prominent syllables are produced with a less tense voice quality and less prominent ones with a more tense phonation. We provide both physiological and linguistic explanations for the phenomena. | |
| #6 | Laryngeal Characteristics during the Production of Geminate Consonants |
| (Center for Corpus Development, National Institute for Japanese Language and Linguistics, Japan) (Center for Corpus Development, National Institute for Japanese Language and Linguistics, Japan) (Science Information Center, Prefectural University of Hiroshima, Japan) | |
| Analysis of high-speed digital video images showed that no apparent constriction or tense appeared in larynx and glottis during the production of geminate consonants. Glottal width for geminate consonants is slightly, but not much, wider than their singleton counterparts. Rather, the degree depends largely on consonant types. However, analysis of photo-electric glottogram showed that an interruption of glottal opening movement and/or abrupt cessation of preceding vowel are suggested to be involved during the production of geminate consonants. | |
| #7 | Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling. |
| (The Center for Advanced Medical Engineering and Informatics, Osaka University, Japan) (The Center for Advanced Medical Engineering and Informatics, Osaka University, Japan) (GIPSA-lab, UMR CNRS 5216, Grenoble Universities, France) (The Center for Advanced Medical Engineering and Informatics, Osaka University, Japan) | |
| The sound generated during the production of the sibilant [s] results from the impact of a turbulent jet on the incisors. Physical modeling of this phenomenon depends on the characterization of the properties of the turbulent flow within the vocal tract and of the acoustic sources resulting from the presence of an obstacle in the path of the flow. The properties of the flow-induced noise strongly depend on several geometric parameters of which the influence has to be determined. In this paper, a simplified vocal tract/tooth geometric model is used to carry out a numerical study on the flow-induced noise generated by a tooth-shaped obstacle placed in a channel. The performed simulations bring out a link between the level of the generated noise and the aperture of the constriction formed by the obstacle. | |
| #8 | Morphological and predictability effects on schwa reduction: The case of Dutch word-initial syllables |
| (Radboud University Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, The Netherlands) (Radboud University Nijmegen, The Netherlands) (Radboud University Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, The Netherlands) | |
| This corpus-based study shows that the presence and duration of schwa in Dutch word-initial syllables are affected by a word’s predictability and its morphological structure. Schwa is less reduced in words that are more predictable given the following word. In addition, schwa may be longer if the syllable forms a prefix, and in prefixes the duration of schwa is positively correlated with the frequency of the word relative to its stem. Our results suggest that the conditions which favor reduced realizations are more complex than one would expect on the basis of the current literature. | |
| #9 | Acoustic-to-Articulatory Inversion based on Local Regression |
| (Centre for Speech Technology, Royal Institute of Technology (KTH), Stockholm, Sweden) (Centre for Speech Technology, Royal Institute of Technology (KTH), Stockholm, Sweden) | |
| This paper presents an Acoustic-to-Articulatory inversion method based on local regression. Two types of local regression, a non-parametric and a local linear regression have been applied on a corpus containing simultaneous recordings of positions of articulators and the corresponding acoustics. A maximum likelihood trajectory smoothing using the estimated dynamics of the articulators is also applied on the regression estimates. The average root mean square error in estimating articulatory positions, given the acoustics, is 1.56 mm for the non-parametric regression and 1.52 mm for the local linear regression. The local linear regression is found to perform significantly better than regression using Gaussian Mixture Models using the same acoustic and articulatory features. | |
| #10 | Korean lenis, fortis, and aspirated stops: Effect of place of articulation on acoustic realization |
| (Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands) | |
| Unlike most of the world's languages, Korean distinguishes three types of voiceless stops, namely lenis, fortis, and aspirated stops. All occur at three places of articulation. In previous work, acoustic measurements are mostly collapsed over the three places of articulation. This study therefore provides acoustic measurements of Korean lenis, fortis, and aspirated stops at all three places of articulation separately. Clear differences are found among the acoustic characteristics of the stops at the different places of articulation. | |
| #11 | Speech Synthesis by Modeling Harmonics Structure with Multiple Function |
| (Kobe University) (IBM Research - Tokyo) (IBM Research - Tokyo) (Kobe University) (Kobe University) | |
| In this paper, we present a new approach for the speech synthesis, in which speech utterances are synthesized using the parameters of spectro-modeling function (Multiple function). With this approach, only harmonic-parts are extracted from the phoneme spectrum, and the time-varying spectrum corresponding to the harmonics or sinusoidal components is modeled using the Multiple function. We introduce two types of the functions, and present the method to estimate the parameters of each function using the observed phoneme spectrum. In the synthesis stage, speech signals are generated from the parameters of the Multiple function. The advantage of this method is that it only requires a few speech synthesis parameters. We discuss the effectiveness of our proposed method through experimental results. | |
| #12 | Physics of Body-Conducted Silent Speech – Production, Propagation and Representation of Non-Audible Murmur |
| (Faculty of Engineering, Shinshu University) (Faculty of Engineering, Toyama Prefectural University) | |
| The physical nature of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) were investigated using numerical simulation and acoustic analysis of the NAM signals. Computational fluid dynamics simulation reveals that a weak vortex flow occurs in the supraglottal region when uttering NAM; a source of NAM is a turbulent noise source produced due to a vortex flow. Furthermore, computational acoustics simulation reveals that NAM signals attenuate 50 dB at 1 kHz consisting of 30-dB full-range attenuation due to air-to-body transmission loss and –10-dB/octave spectral decay due to a sound propagation loss within the body, which roughly equals to the measurement results. |