Phoneme based speech to text translation system for Malaysian English pronunciation
Speech is the most common and vocalized form of human communication. Communication through speech helps to convey the linguistic information and also helps to express information about the person’s social and regional origin, health and emotional state. Recent improvement in phoneme based speech...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Subjects: | |
Online Access: | http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/1/Page%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/2/Full%20text.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-unimap-31909 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Malaysia Perlis |
collection |
UniMAP Institutional Repository |
language |
English |
topic |
Phoneme Speech signal processing English language Speech to text translation Speech recognition systems |
spellingShingle |
Phoneme Speech signal processing English language Speech to text translation Speech recognition systems Sathees Kumar, Nataraj Phoneme based speech to text translation system for Malaysian English pronunciation |
description |
Speech is the most common and vocalized form of human communication.
Communication through speech helps to convey the linguistic information and also
helps to express information about the person’s social and regional origin, health and
emotional state. Recent improvement in phoneme based speech to text translation
system has become one of the most exciting areas of the speech signal processing;
because of the major advances in statistical modeling of speech, automatic speech
recognition systems have find widespread of applications in tasks that require human
machine interface. The advancement and development of speech to text translation
system can be used in many applications such as Medical Transcriptions (digital speech to text) Automated transcription, Telematics and Air traffic control. In this research work, two sets of isolated word speech signal database has been built namely Vowels Class Word Database (VCWD) and Phonemes Class Word Database (PCWD). The VCWD was initially built to classify the isolated words based on the eleven classes of
vowels. The database has been analyzed using four different spectral analysis
techniques such as Mel-Frequency Cepstral Co-efficient (MFCC), Linear Predictive Coefficient
(LPC), Perceptual Linear Predictive Analysis (PLP) and Relative Spectra-
Perceptual Linear Predictive Analysis (RASTA-PLP)) to determine the best
discriminative features and to identify the network parameters. The PCWD has been
built to develop the phoneme based speech to text translation system using Linear
Predictive Coefficients (LPC) and Multilayer Neural Network models (MLNN) using
fusion concept for the classification of isolated words and phoneme. The isolated word
speech signals are recorded using a speech acquisition algorithm developed using a
MATLAB Graphical user interface (GUI). The speech signals are recorded for 15
seconds at 16 kHz sampling frequency. The recorded speech signals are pre-processed
and used to segment the voiced/unvoiced parts of the speech signal. A simple fuzzy
voice classifier has been proposed to extract the voiced portion using frame energy and
change in energy features. The extracted voiced portions are pre-processed and divided
into a number of frames. For each frame signal, the spectral features are extracted and
used as a feature set for the classification. The classification tasks of the isolated words
and phonemes are associated with the extracted features to establish input output
mapping. The data are then normalized and randomized to rearrange the values into
definite range. The Multilayer Neural Network (MLNN) model has been developed
with four combinations of input and hidden activation functions. To improve the
performance rate and reduce the training time a simple systole activation function has
been proposed. The neural network models are trained with 60%, 70% and 80% of the
total data samples. The trained neural network is validated with the remaining 40%,
30% and 20% of data samples by simulating the network. The performance of the
network is calculated by measuring the true positives, false negatives and classification
accuracy and the results are compared. It is observed that the fuzzy voice classifier is
developed with less complexity and yields better accuracy when compared with the
other voiced/unvoiced classification methods available in the literature. The LPC
features show better discrimination and the MLNN neural network models trained using
the LPC spectral band features gives better classification accuracy when compared with
other feature extraction algorithms. Also, the proposed systole activation function
produces reduced training time and epoch rate when compared with the other network
models. |
format |
Thesis |
author |
Sathees Kumar, Nataraj |
author_facet |
Sathees Kumar, Nataraj |
author_sort |
Sathees Kumar, Nataraj |
title |
Phoneme based speech to text translation system for Malaysian English pronunciation |
title_short |
Phoneme based speech to text translation system for Malaysian English pronunciation |
title_full |
Phoneme based speech to text translation system for Malaysian English pronunciation |
title_fullStr |
Phoneme based speech to text translation system for Malaysian English pronunciation |
title_full_unstemmed |
Phoneme based speech to text translation system for Malaysian English pronunciation |
title_sort |
phoneme based speech to text translation system for malaysian english pronunciation |
granting_institution |
Universiti Malaysia Perlis (UniMAP) |
granting_department |
School of Mechatronic Engineering |
url |
http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/1/Page%201-24.pdf http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/2/Full%20text.pdf |
_version_ |
1747836792918245376 |
spelling |
my-unimap-319092014-02-13T10:50:14Z Phoneme based speech to text translation system for Malaysian English pronunciation Sathees Kumar, Nataraj Speech is the most common and vocalized form of human communication. Communication through speech helps to convey the linguistic information and also helps to express information about the person’s social and regional origin, health and emotional state. Recent improvement in phoneme based speech to text translation system has become one of the most exciting areas of the speech signal processing; because of the major advances in statistical modeling of speech, automatic speech recognition systems have find widespread of applications in tasks that require human machine interface. The advancement and development of speech to text translation system can be used in many applications such as Medical Transcriptions (digital speech to text) Automated transcription, Telematics and Air traffic control. In this research work, two sets of isolated word speech signal database has been built namely Vowels Class Word Database (VCWD) and Phonemes Class Word Database (PCWD). The VCWD was initially built to classify the isolated words based on the eleven classes of vowels. The database has been analyzed using four different spectral analysis techniques such as Mel-Frequency Cepstral Co-efficient (MFCC), Linear Predictive Coefficient (LPC), Perceptual Linear Predictive Analysis (PLP) and Relative Spectra- Perceptual Linear Predictive Analysis (RASTA-PLP)) to determine the best discriminative features and to identify the network parameters. The PCWD has been built to develop the phoneme based speech to text translation system using Linear Predictive Coefficients (LPC) and Multilayer Neural Network models (MLNN) using fusion concept for the classification of isolated words and phoneme. The isolated word speech signals are recorded using a speech acquisition algorithm developed using a MATLAB Graphical user interface (GUI). The speech signals are recorded for 15 seconds at 16 kHz sampling frequency. The recorded speech signals are pre-processed and used to segment the voiced/unvoiced parts of the speech signal. A simple fuzzy voice classifier has been proposed to extract the voiced portion using frame energy and change in energy features. The extracted voiced portions are pre-processed and divided into a number of frames. For each frame signal, the spectral features are extracted and used as a feature set for the classification. The classification tasks of the isolated words and phonemes are associated with the extracted features to establish input output mapping. The data are then normalized and randomized to rearrange the values into definite range. The Multilayer Neural Network (MLNN) model has been developed with four combinations of input and hidden activation functions. To improve the performance rate and reduce the training time a simple systole activation function has been proposed. The neural network models are trained with 60%, 70% and 80% of the total data samples. The trained neural network is validated with the remaining 40%, 30% and 20% of data samples by simulating the network. The performance of the network is calculated by measuring the true positives, false negatives and classification accuracy and the results are compared. It is observed that the fuzzy voice classifier is developed with less complexity and yields better accuracy when compared with the other voiced/unvoiced classification methods available in the literature. The LPC features show better discrimination and the MLNN neural network models trained using the LPC spectral band features gives better classification accuracy when compared with other feature extraction algorithms. Also, the proposed systole activation function produces reduced training time and epoch rate when compared with the other network models. Universiti Malaysia Perlis (UniMAP) 2012 Thesis en http://dspace.unimap.edu.my:80/dspace/handle/123456789/31909 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/1/Page%201-24.pdf c0b07f39f02d909c1ccc2ecd991f696f http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/2/Full%20text.pdf 1a7cb3afa51ce94b41c1246b0ef95d39 http://dspace.unimap.edu.my:80/xmlui/bitstream/123456789/31909/3/license.txt 8a4605be74aa9ea9d79846c1fba20a33 Phoneme Speech signal processing English language Speech to text translation Speech recognition systems School of Mechatronic Engineering |