(T-S1-R1)
Discriminative Training - Fundamentals and Applications
- Chin-Hui Lee (Georgia Institute of Technology)
Summary:
Recently discriminative training (DT) has attracted new attentions in
the speech and language processing communities because of its ability
to learn parametric representations and achieve better performance and
enhanced robustness than those with model parameters obtained by
conventional training methods without changing the structure and
complexity of the models being used. When probabilistic distributions
are used to characterize the above representations discriminative
training often implies learning decision boundaries instead of
approximating density functions. Instead of estimating parameters
separately to approximate individual densities DT attempts to jointly
estimate all the parameters of the competing distributions all
together to meet the performance requirements of a specific problem
setting.
In general there are two major families of DT methods. The first is
function based DT. Rather than estimating parameters with the
conventional minimum mean squared error (MMSE), maximum likelihood
(ML), maximum a posteriori (MAP), or maximum entropy (ME) criteria,
one chooses a different objective function to optimize. Well-known
methods include maximum mutual information (MMI), minimum
discriminative information (MDI), minimum description length (MDL),
etc. The choice of the objective functions to be used often depends on
the specific problems to be solved. For example if two-class
categorization is involved as in text-independent speaker verification
we can approximate each class with a Gaussian mixture model (GMM), and
use the ML, MAP or MMI estimation criteria to learn the parameters of
the competing distributions.
The second category is decision-feedback based DT in which a decision
function that determine the performance of the training and testing
procedure on the training set is embedded in the optimization
formulation so that the parameters can be learned by adjusting their
current values to optimize the desired evaluation metrics in the
direction guided by the feedback obtained from the current set of
decision parameters. Some popular techniques are minimum
classification error (MCE), minimum verification error (MVE), minimum
phone error (MPE), maximal figure-of-merit (MFoM), maximum or minimum
area under the receiver operating characteristic curve (AUC), maximum
margin of separation, etc. Again the choice of techniques to be used
depends heavily on the decision function to be used and the evaluation
metrics to be applied. For example in multi-class text categorization
the decision function is usually the argmax operation among scores of
all competing categories. Furthermore the evaluation metric can be
micro or macro F1 or area under the precision-recall ROC curve. We can
then use the MFoM learning algorithm to obtain all the parameters of
all topic categories using any combination of feature vectors and
score functions.
In this tutorial we will review the theory of popular discriminative
training methods commonly used in the speech and language processing
communities. We will then describe the utility of DT and show why DT
offers attractive alternatives to conventional estimation procedures,
especially in the cases when the underlying distributions of the data
or competing classes are not completely known. We will then formulate
DT algorithms for widely-used parametric representations, such as GMM,
hidden Markov model (HMM), linear discriminant function (LDF),
artificial neural network (ANN), linear discriminative analysis (LDA),
and vector quantization. Finally we describe properties of DT
algorithms and illustrate how DT can be used in many speech and
language processing applications, including feature extraction,
acoustic modeling and language modeling for automatic speech
recognition, speaker recognition, utterance verification, spoken
language recognition, and text categorization. We will compare
performance of models obtained before and after DT to show its
effectiveness in enhancing performance and robustness of pattern
recognition and verification algorithms.
Biography:
Chin-Hui Lee is a professor at School of Electrical and Computer
Engineering, Georgia Institute of Technology. Dr. Lee received the
B.S. degree in Electrical Engineering from National Taiwan University,
Taipei, in 1973, the M.S. degree in Engineering and Applied Science
from Yale University, New Haven, in 1977, and the Ph.D. degree in
Electrical Engineering with a minor in Statistics from University of
Washington, Seattle, in 1981.
Dr. Lee started his professional career at Verbex Corporation,
Bedford, MA, and was involved in research on connected word
recognition. In 1984, he became affiliated with Digital Sound
Corporation, Santa Barbara, where he engaged in research and product
development in speech coding, speech synthesis, speech recognition and
signal processing for the development of the DSC-2000 Voice
Server. Between 1986 and 2001, he was with Bell Laboratories, Murray
Hill, New Jersey, where he became a Distinguished Member of Technical
Staff and Director of the Dialogue Systems Research Department. His
research interests include multimedia communication, multimedia signal
and information processing, speech and speaker recognition, speech and
language modeling, spoken dialogue processing, adaptive and
discriminative learning, biometric authentication, and information
retrieval. From August 2001 to August 2002 he was a visiting professor
at School of Computing, The National University of Singapore. In
September 2002, he joined the ECE Faculty at Georgia Institute of
Technology.
Prof. Lee has participated actively in professional societies. He is a
member of the IEEE Signal Processing Society (SPS), Communication
Society, and the International Speech Communication Association
(ISCA). In 1991-1995, he was an associate editor for the IEEE
Transactions on Signal Processing and Transactions on Speech and Audio
Processing. During the same period, he served as a member of the ARPA
Spoken Language Coordination Committee. In 1995-1998 he was a member
of the Speech Processing Technical Committee and later became the
chairman from 1997 to 1998. In 1996, he helped promote the SPS
Multimedia Signal Processing Technical Committee in which he is a
founding member.
Dr. Lee is a Fellow of the IEEE, and has published more than 300
papers and 25 patents. He received the SPS Senior Award in 1994 and
the SPS Best Paper Award in 1997 and 1999, respectively. In 1997, he
was awarded the prestigious Bell Labs President's Gold Award for his
contributions to the Lucent Speech Processing Solutions
product. Dr. Lee often gives seminal lectures to a wide international
audience. In 2000, he was named one of the six Distinguished Lecturers
by the IEEE Signal Processing Society. He was also named one of the
two ISCA's inaugural Distinguished Lecturers in 2007-2008. Recently he
won the SPS's 2006 Technical Achievement Award for "Exceptional
Contributions to the Field of Automatic Speech Recognition".
This page was last updated on 21-June-2010 3:00 UTC.