Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers

Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech difference...

Full description

Saved in:
Bibliographic Details
Main Author: Yusnita, Mohd Ali
Format: Thesis
Language:English
Subjects:
Online Access:http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf
http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech differences in pronunciation and intonation of people from different sociolinguistics background. A large and growing body of literature has revealed the negative effects of various accents as impairment to the ASR performance. Although English accents have been the most studied accent varieties insofar as it is regarded the most important and prestigious international language, Malaysian English (MalE) which signifies a new variety within New Englishes of nonnative speakers is still unexplored. In the ASR market product nowadays, conventional way is to treat MalE as a uniform variety despite this notion is disputed by many scholars and researchers who regard MalE as implication of localized ethnic speech diversity. Past perceptual studies have reported high possibility of detecting ethnic identities from Singapore English (SgE) and Brunei English (BruE) speech as appropriate comparator varieties to MalE accents using listening test setup. At present, no research has been done to identify ethnic origin from speech samples of MalE accented speech using multiple speech analysis techniques and machine learning algorithms for automatic classification for more reliable, standard and accurate experimental methods. This study is an attempt to fill that gap and for this purpose, a new database of MalE accents has been developed. The study elicits speech in isolatedwords and continuous speech from university students of both genders of three main ethnics to represent educated speakers of Malay, Chinese and Indian groups using selected accent-sensitive words from previous studies. The design of the proposed system consists of pre-processing, feature extraction and classification stages. Apart from basic pre-processing, this study proposes integrating fuzzy inference system for voiced-unvoiced (FIS V-UV) frame basis segmentation by itself has contributed an improved overall implementation over conventional automatic accent classification (AAC) system. A new method is proposed, named as global statistical thresholds (GSTs) for establishing membership functions of short-time energy and zero crossing rate inputs in the FIS V-UV segmentation. This proposed segmentation has resulted in a reduced portion of speech activity to be taken further for feature extraction stage. The experimental results demonstrate the efficacy of the proposed FIS V-UV-assisted AAC using GSTs with the highest increase in accuracy rate of 7.70% and frame reduction rate of 24.26% over the conventional AAC. In the second stage, acoustic features correlated to accents of these three ethnics are developed through several techniques of filter bank analysis, vocal tract model, hybrid analysis and fusion analysis. Out of eight formulated feature vectors tested on the MalE database, statistical descriptors of Mel-band spectral energy (MBSE), principal component analysis-transformed MBSE (PCA-MBSE), two hybrid techniques of discrete wavelet transform-derived linear prediction coefficients (DWT-LPC) and two spectral feature fusions (SFFs) of popular Mel-frequency cepstral coefficients and linear prediction coefficients with five formants (MFCC-formants and LPC-formants) are new approaches in this field. The experimental results from the final stage suggest that SFFs techniques are the best approach for this database to classify the three accents of MalE with the best accuracy rate of 97.4%. This technique has outperformed the standard MFCC features by as much as 7.8%. Under robustness analysis, the SFFs followed by PCA-MBSE have shown greater noise resistivity than the others. This thesis also contributes a new technique of feature selection called as statistical band selection (SBS) algorithm using a simple decision to select band, based on the smallest variances within class scores. The experimental results reveal that SBS has increased the performance of AAC by achieving better accuracy rates between 3.9% to 5.6%, lesser memory requirement between 22% to 55% and faster speed of 70% on average of the three-class accent problem. Comparing accent severity between different genders, this study suggests that male speakers possess higher degree of accentedness following consistent results of better classification rates regardless of any technique of acoustic features used. Also, it can be concluded that continuos speech possesses higher intensitity of accent markers than isolated-word speech mode.