Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers

Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech difference...

全面介紹

Saved in:

書目詳細資料
主要作者:	Yusnita, Mohd Ali
格式:	Thesis
語言:	English
主題:	Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis
在線閱讀:	http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

id	my-unimap-44131
record_format	uketd_dc
institution	Universiti Malaysia Perlis
collection	UniMAP Institutional Repository
language	English
topic	Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis
spellingShingle	Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis Yusnita, Mohd Ali Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
description	Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech differences in pronunciation and intonation of people from different sociolinguistics background. A large and growing body of literature has revealed the negative effects of various accents as impairment to the ASR performance. Although English accents have been the most studied accent varieties insofar as it is regarded the most important and prestigious international language, Malaysian English (MalE) which signifies a new variety within New Englishes of nonnative speakers is still unexplored. In the ASR market product nowadays, conventional way is to treat MalE as a uniform variety despite this notion is disputed by many scholars and researchers who regard MalE as implication of localized ethnic speech diversity. Past perceptual studies have reported high possibility of detecting ethnic identities from Singapore English (SgE) and Brunei English (BruE) speech as appropriate comparator varieties to MalE accents using listening test setup. At present, no research has been done to identify ethnic origin from speech samples of MalE accented speech using multiple speech analysis techniques and machine learning algorithms for automatic classification for more reliable, standard and accurate experimental methods. This study is an attempt to fill that gap and for this purpose, a new database of MalE accents has been developed. The study elicits speech in isolatedwords and continuous speech from university students of both genders of three main ethnics to represent educated speakers of Malay, Chinese and Indian groups using selected accent-sensitive words from previous studies. The design of the proposed system consists of pre-processing, feature extraction and classification stages. Apart from basic pre-processing, this study proposes integrating fuzzy inference system for voiced-unvoiced (FIS V-UV) frame basis segmentation by itself has contributed an improved overall implementation over conventional automatic accent classification (AAC) system. A new method is proposed, named as global statistical thresholds (GSTs) for establishing membership functions of short-time energy and zero crossing rate inputs in the FIS V-UV segmentation. This proposed segmentation has resulted in a reduced portion of speech activity to be taken further for feature extraction stage. The experimental results demonstrate the efficacy of the proposed FIS V-UV-assisted AAC using GSTs with the highest increase in accuracy rate of 7.70% and frame reduction rate of 24.26% over the conventional AAC. In the second stage, acoustic features correlated to accents of these three ethnics are developed through several techniques of filter bank analysis, vocal tract model, hybrid analysis and fusion analysis. Out of eight formulated feature vectors tested on the MalE database, statistical descriptors of Mel-band spectral energy (MBSE), principal component analysis-transformed MBSE (PCA-MBSE), two hybrid techniques of discrete wavelet transform-derived linear prediction coefficients (DWT-LPC) and two spectral feature fusions (SFFs) of popular Mel-frequency cepstral coefficients and linear prediction coefficients with five formants (MFCC-formants and LPC-formants) are new approaches in this field. The experimental results from the final stage suggest that SFFs techniques are the best approach for this database to classify the three accents of MalE with the best accuracy rate of 97.4%. This technique has outperformed the standard MFCC features by as much as 7.8%. Under robustness analysis, the SFFs followed by PCA-MBSE have shown greater noise resistivity than the others. This thesis also contributes a new technique of feature selection called as statistical band selection (SBS) algorithm using a simple decision to select band, based on the smallest variances within class scores. The experimental results reveal that SBS has increased the performance of AAC by achieving better accuracy rates between 3.9% to 5.6%, lesser memory requirement between 22% to 55% and faster speed of 70% on average of the three-class accent problem. Comparing accent severity between different genders, this study suggests that male speakers possess higher degree of accentedness following consistent results of better classification rates regardless of any technique of acoustic features used. Also, it can be concluded that continuos speech possesses higher intensitity of accent markers than isolated-word speech mode.
format	Thesis
author	Yusnita, Mohd Ali
author_facet	Yusnita, Mohd Ali
author_sort	Yusnita, Mohd Ali
title	Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
title_short	Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
title_full	Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
title_fullStr	Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
title_full_unstemmed	Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
title_sort	investigation of robust speech feature extraction techniques for accents classification of malaysian engllish speakers
granting_institution	Universiti Malaysia Perlis (UniMAP)
granting_department	School of Mechatronic Engineering
url	http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf
_version_	1747836826698121216
spelling	my-unimap-441312016-11-22T08:49:51Z Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers Yusnita, Mohd Ali Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech differences in pronunciation and intonation of people from different sociolinguistics background. A large and growing body of literature has revealed the negative effects of various accents as impairment to the ASR performance. Although English accents have been the most studied accent varieties insofar as it is regarded the most important and prestigious international language, Malaysian English (MalE) which signifies a new variety within New Englishes of nonnative speakers is still unexplored. In the ASR market product nowadays, conventional way is to treat MalE as a uniform variety despite this notion is disputed by many scholars and researchers who regard MalE as implication of localized ethnic speech diversity. Past perceptual studies have reported high possibility of detecting ethnic identities from Singapore English (SgE) and Brunei English (BruE) speech as appropriate comparator varieties to MalE accents using listening test setup. At present, no research has been done to identify ethnic origin from speech samples of MalE accented speech using multiple speech analysis techniques and machine learning algorithms for automatic classification for more reliable, standard and accurate experimental methods. This study is an attempt to fill that gap and for this purpose, a new database of MalE accents has been developed. The study elicits speech in isolatedwords and continuous speech from university students of both genders of three main ethnics to represent educated speakers of Malay, Chinese and Indian groups using selected accent-sensitive words from previous studies. The design of the proposed system consists of pre-processing, feature extraction and classification stages. Apart from basic pre-processing, this study proposes integrating fuzzy inference system for voiced-unvoiced (FIS V-UV) frame basis segmentation by itself has contributed an improved overall implementation over conventional automatic accent classification (AAC) system. A new method is proposed, named as global statistical thresholds (GSTs) for establishing membership functions of short-time energy and zero crossing rate inputs in the FIS V-UV segmentation. This proposed segmentation has resulted in a reduced portion of speech activity to be taken further for feature extraction stage. The experimental results demonstrate the efficacy of the proposed FIS V-UV-assisted AAC using GSTs with the highest increase in accuracy rate of 7.70% and frame reduction rate of 24.26% over the conventional AAC. In the second stage, acoustic features correlated to accents of these three ethnics are developed through several techniques of filter bank analysis, vocal tract model, hybrid analysis and fusion analysis. Out of eight formulated feature vectors tested on the MalE database, statistical descriptors of Mel-band spectral energy (MBSE), principal component analysis-transformed MBSE (PCA-MBSE), two hybrid techniques of discrete wavelet transform-derived linear prediction coefficients (DWT-LPC) and two spectral feature fusions (SFFs) of popular Mel-frequency cepstral coefficients and linear prediction coefficients with five formants (MFCC-formants and LPC-formants) are new approaches in this field. The experimental results from the final stage suggest that SFFs techniques are the best approach for this database to classify the three accents of MalE with the best accuracy rate of 97.4%. This technique has outperformed the standard MFCC features by as much as 7.8%. Under robustness analysis, the SFFs followed by PCA-MBSE have shown greater noise resistivity than the others. This thesis also contributes a new technique of feature selection called as statistical band selection (SBS) algorithm using a simple decision to select band, based on the smallest variances within class scores. The experimental results reveal that SBS has increased the performance of AAC by achieving better accuracy rates between 3.9% to 5.6%, lesser memory requirement between 22% to 55% and faster speed of 70% on average of the three-class accent problem. Comparing accent severity between different genders, this study suggests that male speakers possess higher degree of accentedness following consistent results of better classification rates regardless of any technique of acoustic features used. Also, it can be concluded that continuos speech possesses higher intensitity of accent markers than isolated-word speech mode. Universiti Malaysia Perlis (UniMAP) 2014 Thesis en http://dspace.unimap.edu.my:80/xmlui/handle/123456789/44131 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/3/license.txt 8a4605be74aa9ea9d79846c1fba20a33 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf 30f0a21053b8b420a1e1920ebae71238 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf 9f5d54eca3ddfacbf90510d0f2fdc8a8 Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis School of Mechatronic Engineering

Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers

相似書籍