Table 1. System Performance Comparison on NIST02 corpus
at 4th iteration is the best performer in term of EER and DCF. Compared with the baseline system, the ICM at 4th iteration achieves
the relative improvement of 26.5% in term of EER. Furthermore,
the T-Norm procedure is least effective in ICM4 system, indicating
that the ICM-based method is a very effective score normalization
scheme that can be used without T-Norm procedure. Thus, the computation of T-Norm can be saved in the ICM-based method.
6. CONCLUSION AND FUTURE DIRECTION
In this paper, we compare two discriminative learning frameworks
for text-independent speaker verification. The framework based on
fisher mapping and SVM learning achieves better performance in
term of EER than the GMM-UBM baseline. While the framework
based on utterance transform and Iterative Cohort Modeling is able
to outperform the GMM-UBM system and fisher-mapping system
on NIST02 task. The ICM based method achieves 26.5% relatively
improvement on EER (10.98% → 8.07%). In both fisher mapping
and ICM based methods, the universal background model defines a
mapping function from variable-length speech utterance to a fixed
dimensional vector space. The Gaussian Mixture Model trained
with conventional EM algorithm may not be the optimal background
model for this purpose. In the near future, we will investigate different training methods of the background model and different structures of the background for searching a better mapping function.
 M. Schmidt and H. Gish, “Speaker identification via support
vector classifiers,” in Proc. IEEE Int. Conf. Acoustics, Speech,
Signal Processing, 1996, pp. 105–108.
 V. Wan and W. Campbell, “Support vector machines for
speaker verification and identification,” in Proceeding of
Neural Networks for Signal Processing X, 2000.
 S. Fine, J. Navr´atil, and R. A. Gopinath, “A hybrid gmm/svm
approach to speaker identification,” in Proc. ICASSP, 2001,
 Vincent Wan and Steve Renals, “Speaker verification using
sequence discriminant support vector machines,” IEEE Transactions on Speech and Audio Processing, pp. 203–210, March
 W. Campbell, “Generalized linear discriminant sequence kernels for speaker recognition,” in Proc. International Conference on Acoustics Speech and Signal Processing, 2002, pp.
 J. Louradour and K. Daoudi, “Svm speaker verification using
a new sequence kernel,” in Proc. European Signal Processing
 D. E. Sturim, D. A. Reynolds, E. Singer, and J. P. Campbell,
“Speaker indexing in large audio databases using anchor models,” in Proceedings of ICASSP, 2001, pp. 429–432.
 Ming Liu, Zhengyou Zhang, and Thomas S. Huang, “Robust
local scoring function for text-independent speaker verification,” in Proc. International Conference of Pattern Recognition, 2006.
 V. Vapnik, Statistical Learning Theory, Wiley, New York,
 Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for
support vector machines, 2001.
 “http://www.nist.gov/speech/tests/spk/,” .
 D. A. Reynolds, “Speaker identification and verification using
Gaussian mixture speaker models,” Speech Communication,
vol. 17, pp. 91–108, 1995.
 Douglas A. Reynolds, Thomas F. Quatieri, and Robert B.
Dunn, “Speaker verification using adapted gaussian mixture
models,” Digital Signal Processing, pp. 19–41, January 2000.
 J. Pelecanos and S. Sridharan, “Feature warping for robust
speaker verification,” in Proceeding, A Speaker Odyssey, 2001.
 Bing Xiang, Upendra V. Chaudhari, Jiri Navratil, Ganesh N.
Ramaswamy, and Ramesh A. Gopinath, “Short-time gaussianization for robust speaker verification,” in Proceedings,
 Roland Auckenthaler, Michael Carey, and Harvey LloydThomas, “Score normalization for text-independent speaker
verification systems,” Digital Signal Processing, pp. 42–54,
 “http://htk.eng.cam.ac.uk/,” .