| Time: | Monday 16:00 | Place: | 201B | Type: | Oral |
| Chair: | Masashi Unoki | ||||
| 16:00 | A FACTORIAL SPARSE CODER MODEL FOR SINGLE CHANNEL SOURCE SEPARATION |
| (Graz University of Technology) (Graz University of Technology) (Graz University of Technology) (University of Crete) | |
| We propose a probabilistic factorial sparse coder model for single channel source separation in the magnitude spectrogram domain. The mixture spectrogram is assumed to be the sum of the sources, which are assumed to be generated frame-wise as the output of sparse coders plus noise. For dictionary training we use an algorithm which can be described as non-negative matrix factorization with ℓ0 sparseness constraints. In order to infer likely source spectrogram candidates, we approximate the intractable exact inference by maximizing the posterior over a plausible subset of solutions. We compare our system to the factorial-max vector quantization model, where the proposed method shows a superior performance in terms of signal-to-interference ratio. Finally, the low computational requirements of the algorithm allows close to real time applications. | |
| 16:20 | ORIENTED PCA METHOD FOR BLIND SPEECH SEPARATION OF CONVOLUTIVE MIXTURES |
| (INRS-EMT Telecommunications Canada) (Université de Moncton Canada) (INRS-EMT Telecommunications Canada) | |
| This paper deals with blind speech separation of convolutive mixtures of sources. The separation criterion is based on Oriented Principal Components Analysis (OPCA) in the frequency domain. OPCA is a (second order) extension of standard Principal Component Analysis (PCA) aiming at maximizing the power ratio of a pair of signals. The convolutive mixing is obtained by modeling the Head Related Transfer Function (HRTF). Experimental results show the efficiency of the proposed approach in terms of subjective and objective evaluation, when compared to the Degenerate Unmixing Evaluation Technique (DUET) and the widely used C-FICA (Convolutive Fast-ICA) algorithm | |
| 16:40 | Online Gaussian Process for Nonstationary Speech Separation |
| (National Cheng Kung University) (National Cheng Kung University) | |
| In a practical speech enhancement system, it is required to enhance speech signals from the mixed signals, which were corrupted due to the nonstationary source signals and mixing conditions. The source voices may be from different moving speakers. The speakers may abruptly appear or disappear and may be permuted continuously. To deal with these scenarios with a varying number of sources, we present a new method for nonstationary speech separation. An online Gaussian process independent component analysis (OLGP-ICA) is developed to characterize the real-time temporal structure in time-varying mixing system and to capture the evolved statistics of independent sources from online observed signals. A variational Bayes algorithm is established to estimate the evolved parameters for dynamic source separation. In the experiments, the proposed OLGP-ICA is compared with other ICA methods and is illustrated to be effective in recovering speech and music signals in a nonstationary speaking environment. | |
| 17:00 | Convexity and Fast Speech Extraction by Split Bregman Method |
| (Department of Mathematics, University of California, Irvine, USA) (Department of Mathematics, University of California, Los Angeles, USA) (Department of Mathematics, University of California, Irvine, USA) (Department of Mathematics, University of California, Los Angeles, USA) | |
| A fast speech extraction (FSE) method is presented using convex optimization made possible by pause detection of the speech sources. Sparse unmixing filters are sought by L1 regularization and the split Bregman method. A subdivided split Bregman method is developed for efficiently estimating long reverberations in real room recordings. The speech pause detection is based on a binary mask source separation method. The FSE method is evaluated and found to outperform existing blind speech separation approaches on both synthetic and room recorded data in terms of the overall computational speed and separation quality. | |
| 17:20 | Reducing Musical Noise in Blind Source Separation by Time-Domain Sparse Filters and Split Bregman Method |
| (Department of Mathematics, University of California, Los Angeles, USA) (Department of Mathematics, University of California, Irvine, USA) (Department of Mathematics, University of California, Irvine, USA) (Department of Mathematics, University of California, Los Angeles, USA) | |
| Musical noise often arises in the outputs of time-frequency binary mask based blind source separation approaches. Post-processing is desired to enhance the separation quality. An efficient musical noise reduction method by time-domain sparse filters is presented using convex optimization. The sparse filters are sought by L1 regularization and the split Bregman method. The proposed musical noise reduction method is evaluated by both synthetic and room recorded speech and music data, and found to outperform existing musical noise reduction methods in terms of the objective and subjective measures. | |
| 17:40 | Combining Monaural and Binaural Evidence for Reverberant Speech Segregation |
| (Department of Computer Science and Engineering, The Ohio State University, United States) (Department of Computer Science and Engineering, The Ohio State University, United States) (Department of Computer Science and Engineering, The Ohio State University, United States) (Department of Computer Science and Engineering, The Ohio State University, United States and Center for Cognitive Science, The Ohio State University, United States) | |
| Most existing binaural approaches to speech segregation rely on spatial filtering. In environments with minimal reverberation and when sources are well separated in space, spatial filtering can achieve excellent results. However, in everyday environments performance degrades substantially. To address these limitations, we incorporate monaural analysis within a binaural segregation system. We use monaural cues to perform both local and across frequency grouping of mixture components, allowing for a more robust application of spatial filtering. We propose a novel framework in which we combine monaural grouping evidence and binaural localization evidence in a linear model for the estimation of the ideal binary mask. Results indicate that with appropriately designed features that capture both monaural and binaural evidence, an extremely simple model achieves a signal-to-noise ratio improvement of up to 4 dB relative to using spatial filtering alone. |