Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis

For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in...

Full description

Saved in:
Bibliographic Details
Main Author: Thum, Wei Seong
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-ump-ir.27969
record_format uketd_dc
spelling my-ump-ir.279692020-02-25T04:17:26Z Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis 2018-12 Thum, Wei Seong TK Electrical engineering. Electronics Nuclear engineering For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively. 2018-12 Thesis http://umpir.ump.edu.my/id/eprint/27969/ http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf pdf en public masters Universiti Malaysia Pahang Faculty of Electrical and Electronics Engineering
institution Universiti Malaysia Pahang Al-Sultan Abdullah
collection UMPSA Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Thum, Wei Seong
Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
description For audio-visual speech recognition (AVSR) that uses audio modality combined with visual modality, the performance of speech recognition system can be improved, particularly when operating in a noisy environment. Audio modality can be easily corrupted by ambient noise, and this causes difficulty in distinguishing the actual speech signal with noise signal correctly. Signal-to-noise ratio (SNR) is a fundamental measuring ratio of signal power over noise power, which is expressed in decibels (dB). One of the most famous SNR estimation techniques is the waveform amplitude distribution analysis (WADA), where it assumes that the amplitude of speech and noise follows gamma and Gaussian distributions. It has been used in some research works as a benchmark for result comparison. However, there is no clear instruction on how to build the look-up table. In this work, the development and rebuild of the look-up table using the own database corrupted with general white noise as the noise reference has been proposed. The reconstruction of WADA look-up table technique, which is known as the waveform amplitude distribution analysis-white (WADA-W), is able to enhance the SNR estimation by referring to the reconstructed WADA-W look-up table instead of a general WADA precomputed look-up table. The proposed WADA-W SNR estimation technique was evaluated by developing an AVSR system that utilised mel-frequency cepstral coefficients (MFCC) features and shape-based visual features from two speech databases: LUNA-V and CUAVE. According to the experimental result, it showed that by referring to the WADA-W look-up table, it is capable of performing a consistent SNR estimation with more accurate and less bias result compared to the original WADA technique under four types of noises, which are white, babble, factory1, and factory2 noises from the NOISEX-92 dataset. The overall deviation of the SNR estimation of the LUNA-V database using the proposed WADA-W technique was just approximately 9.6dB, whereas the deviation of NIST and WADA techniques was approximately 42.3dB and 67.3dB respectively. By using the same proposed technique for CUAVE database, the overall deviation of the SNR estimation was only 13.3dB, whereas the deviation of NIST and WADA techniques was 50.6dB and 62.3dB respectively. The classification was done using the multi-stream hidden Markov model (MSHMM) with leave-one-out cross-validation (LOOCV) technique. From the experiments, it showed that the proposed AVSR system able to achieve the highest accuracy at 96.6% using LUNA-V database and 95.2% for CUAVE database under clean condition. In conclusion, the proposed WADA-W SNR estimator able to improve by 4.5% and 12.7% compared to the original WADA technique by using the LUNA-V and CUAVE database respectively.
format Thesis
qualification_level Master's degree
author Thum, Wei Seong
author_facet Thum, Wei Seong
author_sort Thum, Wei Seong
title Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_short Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_fullStr Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_full_unstemmed Development on SNR estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
title_sort development on snr estimator for audio-visual speech recognition based on waveform amplitude distribution analysis
granting_institution Universiti Malaysia Pahang
granting_department Faculty of Electrical and Electronics Engineering
publishDate 2018
url http://umpir.ump.edu.my/id/eprint/27969/1/Development%20on%20SNR%20estimator%20for%20audio-visual%20speech%20recognition%20based%20on%20waveform%20amplitude.pdf
_version_ 1783732106998841344