| Time: | Wednesday 16:00 | Place: | 301 | Type: | Special |
| Chair: | Tara Sainath & Bhuvana Ramabhadran | ||||
| 16:00 | Towards a robust face recognition system using compressive sensing |
| (University of California, Berkeley) (University of Illinois) (University of Illinois) (University of California, Berkeley) | |
| An application of compressive sensing (CS) theory in image-based robust face recognition is considered. Motivated by CS, the problem has been recently cast in a sparse representation framework: The sparsest linear combination of a query image is sought using all prior training images as an overcomplete dictionary, and the dominant sparse coefficients reveal the identity of the query image. The ability to perform dense error correction directly in the image space also provides an intriguing solution to compensate pixel corruption and improve the recognition accuracy exceeding most existing solutions. Furthermore, a local iterative process can be applied to solve for an image transformation applied to the face region when the query image is misaligned. Finally, we discuss the state of the art in fast algorithms to improve the speed of the system. The paper also provides useful guidelines to practitioners working in similar fields, such as acoustic/speech recognition. | |
| 16:20 | Exemplar-Based Sparse Representation Features for Speech Recognition |
| (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) | |
| In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test features into the linear span of training examples. We show that the frame classification accuracy with these new features is 1.3% higher than a Gaussian Mixture Model (GMM), showing that not only do SRs move test features closer to training, but also move the features closer to the correct class. Given these new SR features, we train up a Hidden Markov Model (HMM) on these features and perform recognition. On the TIMIT corpus, we find that applying the SR features on top of our best discriminatively trained system allows for a 0.7% absolute reduction in phonetic error rate (PER), from 19.9% to 19.2%. In fact, after applying model adaptation we reduce the PER to 19.0%, the best results on TIMIT to date. Furthermore, on a large vocabulary 50 hour broadcast news task, we achieve a reduction in word error rate (WER) of 0.3% absolute, demonstrating the benefit of these SR features for large vocabulary. | |
| 16:40 | Data Selection for Language Modeling Using Sparse Representations |
| (IBM TJ Watson Research Center) (IBM TJ Watson Research Center) (IBM TJ Watson Research Center) (IBM TJ Watson Research Center) (jfhkdf) | |
| The ability to adapt language models to specific domains from large generic text corpora is of considerable interest to the language modeling community. One of the key challenges is to identify the text material relevant to a domain in the generic text collection. The text selection problem can be cast in a semi-supervised learning framework where the initial hypothesis from a speech recognition system is used to identify relevant training material. We present a novel sparse representation formulation which selects a sparse set of relevant sentences from the training data which match the test set distribution. In this formulation, the training sentences are treated as the columns of the sparse representation matrix and the n-gram counts as the rows. The target vector is the n-gram probability distribution for the test data. A sparse solution to this problem formulation identifies a few columns which can best represent the target test vector, thus identifying the relevant set of sentences from the training data. Rescoring results with the language model built from the data selected using the proposed method yields modest gains on the English broadcast news RT-04 task, reducing the word error rate from 14.6% to 14.4%. | |
| 17:00 | Observation uncertainty measures for sparse imputation |
| (Centre for Language and Speech Technology, Radboud University Nijmegen, The Netherlands) (Adaptive Informatics Research Centre, Aalto University, Finland) (Adaptive Informatics Research Centre, Aalto University, Finland) | |
| Missing data imputation estimates the clean speech features for automatic speech recognition in noisy environments. The estimates are usually considered equally reliable while in reality, the estimation accuracy varies from feature to feature. In this work, we propose uncertainty measures to characterise the expected accuracy of a sparse imputation (SI) based missing data method. In experiments on noisy large vocabulary speech data, using observation uncertainties derived from the proposed measures improved the speech recognition performance on features estimated with SI. Relative error reductions up to 15% compared to the baseline system using SI without uncertainties were achieved with the best measures. | |
| 17:20 | Sparse Representations for Text Categorization |
| (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (IBM T.J. Watson Research Center) (Department of Computer Science, Columbia University) | |
| Sparse representations (SRs) are often used to characterize a test signal using a few support training examples, and allow the number of supports to be adapted to the specific signal being categorized. Given the good performance of SRs compared to other classifiers for both image and phonetic classification, in this paper, we extend the use of SRs for text classification, a method which has thus far not been explored for this domain. Specifically, we demonstrate how sparse representations can be used for text classification and how their performance varies with the vocabulary size of the document features. In addition, we also show that this method offers promising results over the Naive Bayes (NB) classifier, a standard classifier used for text classification, thus introducing an alternative class of methods for text categorization. | |
| 17:40 | Sparse Auto-associative Neural Networks: Theory and Application to Speech Recognition |
| (Johns Hopkins University) (Johns Hopkins University) (Johns Hopkins University) | |
| This paper introduces the sparse auto-associative neural network (SAANN) in which the internal hidden layer output is forced to be sparse. This is achieved by adding a sparse regularization term to the original reconstruction error cost function, and updating the parameters of the network to minimize the overall cost. We show applicability of this network to phoneme recognition by extracting sparse hidden layer outputs (used as features) from a network which is trained using perceptual linear prediction (PLP) cepstral coefficients in an unsupervised manner. Experiments with the SAANN features on a state-of-the-art TIMIT phoneme recognition system show a relative improvement in phoneme error rate of 5.1% over the baseline PLP features. |