Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition

One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization...

Full description

Saved in:

Bibliographic Details
Main Author:	Wong, Jensen Jing Lung
Format:	Thesis
Language:	English
Published:	2014
Subjects:	BF Psychology
Online Access:	http://eprints.utm.my/id/eprint/48537/1/JensenWongJingLungMFC2014.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	One of the important issues in speaker independent speech recognition system is to compensate speaker variability. Speaker variability is usually related to the physical difference in vocal tract length. Compensation on vocal tract length variation can be made using Vocal Tract Length Normalization (VTLN) method which is known to be able to normalize speech utterances via specific speaker frequency warping. However, this approach leads to repetition process in finding optimal value for warping per speakers, which increase computational cost. This work proposed an alternative approach in finding optimal warping factor in VTLN via multi-speaker frequency warping in which only one optimum warping factor value is used for all speakers. The proposed multi-speaker frequency warping VTLN is experimented using different experimental setup on language model, phoneme categorization and warping values through trial and error method. The data used in this work is large vocabulary TIMIT dataset and Hidden Markov Model Toolkit (HTK) is used for classification purpose. The obtained results show that the proposed approach has achieved improvement of up to 1.0% higher phoneme accuracy rate compared to the baseline result. The proposed approach performance is at par with speaker-specific warping approach but with added advantage of lesser computational cost.

Multi-speaker frequency warping vocal tract length normalization for speaker independent speech recognition

Similar Items