Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers
Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech difference...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Subjects: | |
Online Access: | http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-unimap-44131 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Malaysia Perlis |
collection |
UniMAP Institutional Repository |
language |
English |
topic |
Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis |
spellingShingle |
Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis Yusnita, Mohd Ali Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
description |
Automatic speech recognition (ASR) system is not a new topic in speech processing and
human-machine interaction. It has been established for more than five decades.
However, accent remains a great challenge closely related to multilingualism in today’s
ASR issues which manifests speech differences in pronunciation and intonation of people from different sociolinguistics background. A large and growing body of
literature has revealed the negative effects of various accents as impairment to the ASR
performance. Although English accents have been the most studied accent varieties
insofar as it is regarded the most important and prestigious international language,
Malaysian English (MalE) which signifies a new variety within New Englishes of nonnative
speakers is still unexplored. In the ASR market product nowadays, conventional
way is to treat MalE as a uniform variety despite this notion is disputed by many
scholars and researchers who regard MalE as implication of localized ethnic speech
diversity. Past perceptual studies have reported high possibility of detecting ethnic
identities from Singapore English (SgE) and Brunei English (BruE) speech as
appropriate comparator varieties to MalE accents using listening test setup. At present, no research has been done to identify ethnic origin from speech samples of MalE accented speech using multiple speech analysis techniques and machine learning algorithms for automatic classification for more reliable, standard and accurate experimental methods. This study is an attempt to fill that gap and for this purpose, a new database of MalE accents has been developed. The study elicits speech in isolatedwords and continuous speech from university students of both genders of three main
ethnics to represent educated speakers of Malay, Chinese and Indian groups using
selected accent-sensitive words from previous studies. The design of the proposed
system consists of pre-processing, feature extraction and classification stages. Apart
from basic pre-processing, this study proposes integrating fuzzy inference system for
voiced-unvoiced (FIS V-UV) frame basis segmentation by itself has contributed an
improved overall implementation over conventional automatic accent classification
(AAC) system. A new method is proposed, named as global statistical thresholds
(GSTs) for establishing membership functions of short-time energy and zero crossing
rate inputs in the FIS V-UV segmentation. This proposed segmentation has resulted in a
reduced portion of speech activity to be taken further for feature extraction stage. The
experimental results demonstrate the efficacy of the proposed FIS V-UV-assisted AAC
using GSTs with the highest increase in accuracy rate of 7.70% and frame reduction rate
of 24.26% over the conventional AAC. In the second stage, acoustic features correlated
to accents of these three ethnics are developed through several techniques of filter bank
analysis, vocal tract model, hybrid analysis and fusion analysis. Out of eight formulated
feature vectors tested on the MalE database, statistical descriptors of Mel-band spectral
energy (MBSE), principal component analysis-transformed MBSE (PCA-MBSE), two
hybrid techniques of discrete wavelet transform-derived linear prediction coefficients
(DWT-LPC) and two spectral feature fusions (SFFs) of popular Mel-frequency cepstral
coefficients and linear prediction coefficients with five formants (MFCC-formants and
LPC-formants) are new approaches in this field. The experimental results from the final
stage suggest that SFFs techniques are the best approach for this database to classify the three accents of MalE with the best accuracy rate of 97.4%. This technique has
outperformed the standard MFCC features by as much as 7.8%. Under robustness
analysis, the SFFs followed by PCA-MBSE have shown greater noise resistivity than
the others. This thesis also contributes a new technique of feature selection called as
statistical band selection (SBS) algorithm using a simple decision to select band, based
on the smallest variances within class scores. The experimental results reveal that SBS
has increased the performance of AAC by achieving better accuracy rates between 3.9%
to 5.6%, lesser memory requirement between 22% to 55% and faster speed of 70% on
average of the three-class accent problem. Comparing accent severity between different
genders, this study suggests that male speakers possess higher degree of accentedness
following consistent results of better classification rates regardless of any technique of
acoustic features used. Also, it can be concluded that continuos speech possesses higher
intensitity of accent markers than isolated-word speech mode. |
format |
Thesis |
author |
Yusnita, Mohd Ali |
author_facet |
Yusnita, Mohd Ali |
author_sort |
Yusnita, Mohd Ali |
title |
Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
title_short |
Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
title_full |
Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
title_fullStr |
Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
title_full_unstemmed |
Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers |
title_sort |
investigation of robust speech feature extraction techniques for accents classification of malaysian engllish speakers |
granting_institution |
Universiti Malaysia Perlis (UniMAP) |
granting_department |
School of Mechatronic Engineering |
url |
http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf |
_version_ |
1747836826698121216 |
spelling |
my-unimap-441312016-11-22T08:49:51Z Investigation of robust speech feature extraction techniques for accents classification of Malaysian Engllish speakers Yusnita, Mohd Ali Automatic speech recognition (ASR) system is not a new topic in speech processing and human-machine interaction. It has been established for more than five decades. However, accent remains a great challenge closely related to multilingualism in today’s ASR issues which manifests speech differences in pronunciation and intonation of people from different sociolinguistics background. A large and growing body of literature has revealed the negative effects of various accents as impairment to the ASR performance. Although English accents have been the most studied accent varieties insofar as it is regarded the most important and prestigious international language, Malaysian English (MalE) which signifies a new variety within New Englishes of nonnative speakers is still unexplored. In the ASR market product nowadays, conventional way is to treat MalE as a uniform variety despite this notion is disputed by many scholars and researchers who regard MalE as implication of localized ethnic speech diversity. Past perceptual studies have reported high possibility of detecting ethnic identities from Singapore English (SgE) and Brunei English (BruE) speech as appropriate comparator varieties to MalE accents using listening test setup. At present, no research has been done to identify ethnic origin from speech samples of MalE accented speech using multiple speech analysis techniques and machine learning algorithms for automatic classification for more reliable, standard and accurate experimental methods. This study is an attempt to fill that gap and for this purpose, a new database of MalE accents has been developed. The study elicits speech in isolatedwords and continuous speech from university students of both genders of three main ethnics to represent educated speakers of Malay, Chinese and Indian groups using selected accent-sensitive words from previous studies. The design of the proposed system consists of pre-processing, feature extraction and classification stages. Apart from basic pre-processing, this study proposes integrating fuzzy inference system for voiced-unvoiced (FIS V-UV) frame basis segmentation by itself has contributed an improved overall implementation over conventional automatic accent classification (AAC) system. A new method is proposed, named as global statistical thresholds (GSTs) for establishing membership functions of short-time energy and zero crossing rate inputs in the FIS V-UV segmentation. This proposed segmentation has resulted in a reduced portion of speech activity to be taken further for feature extraction stage. The experimental results demonstrate the efficacy of the proposed FIS V-UV-assisted AAC using GSTs with the highest increase in accuracy rate of 7.70% and frame reduction rate of 24.26% over the conventional AAC. In the second stage, acoustic features correlated to accents of these three ethnics are developed through several techniques of filter bank analysis, vocal tract model, hybrid analysis and fusion analysis. Out of eight formulated feature vectors tested on the MalE database, statistical descriptors of Mel-band spectral energy (MBSE), principal component analysis-transformed MBSE (PCA-MBSE), two hybrid techniques of discrete wavelet transform-derived linear prediction coefficients (DWT-LPC) and two spectral feature fusions (SFFs) of popular Mel-frequency cepstral coefficients and linear prediction coefficients with five formants (MFCC-formants and LPC-formants) are new approaches in this field. The experimental results from the final stage suggest that SFFs techniques are the best approach for this database to classify the three accents of MalE with the best accuracy rate of 97.4%. This technique has outperformed the standard MFCC features by as much as 7.8%. Under robustness analysis, the SFFs followed by PCA-MBSE have shown greater noise resistivity than the others. This thesis also contributes a new technique of feature selection called as statistical band selection (SBS) algorithm using a simple decision to select band, based on the smallest variances within class scores. The experimental results reveal that SBS has increased the performance of AAC by achieving better accuracy rates between 3.9% to 5.6%, lesser memory requirement between 22% to 55% and faster speed of 70% on average of the three-class accent problem. Comparing accent severity between different genders, this study suggests that male speakers possess higher degree of accentedness following consistent results of better classification rates regardless of any technique of acoustic features used. Also, it can be concluded that continuos speech possesses higher intensitity of accent markers than isolated-word speech mode. Universiti Malaysia Perlis (UniMAP) 2014 Thesis en http://dspace.unimap.edu.my:80/xmlui/handle/123456789/44131 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/3/license.txt 8a4605be74aa9ea9d79846c1fba20a33 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/1/p.%201-24.pdf 30f0a21053b8b420a1e1920ebae71238 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/44131/2/full%20text.pdf 9f5d54eca3ddfacbf90510d0f2fdc8a8 Automatic speech recognition (ASR) Automatic accent classification English language Algorithms obustness analysis School of Mechatronic Engineering |