Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone

The development of speech-enabled mobile applications has greatly improved human-computer interaction in recent years. These applications are flexible and convenient for users. Since the speech signal is captured in mobile conditions, it may easily be contaminated by background noises, which may...

Full description

Saved in:
Bibliographic Details
Main Author: Md Jamal, Norezmi
Format: Thesis
Language:English
English
English
Published: 2022
Subjects:
Online Access:http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf
http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uthm-ep.8464
record_format uketd_dc
spelling my-uthm-ep.84642023-02-27T01:01:10Z Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone 2022-01 Md Jamal, Norezmi TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television The development of speech-enabled mobile applications has greatly improved human-computer interaction in recent years. These applications are flexible and convenient for users. Since the speech signal is captured in mobile conditions, it may easily be contaminated by background noises, which may result in a complicated computation and require speech enhancement algorithm. Thus, the performance of speech applications can be degraded when signal-to-noise ratio (SNR) is low and nonstationary noise is present. Moreover, the task of removing noises without causing speech distortion is also challenging, in which the quality and intelligibility of speech are affected. In order to overcome these issues, a supervised Deep Neural Network (DNN) algorithm predicted constrained Wiener Filter (cWF) target mask algorithm based on extracted Gammatone filter bank power spectrum (GF-TF) features and trained model is developed. As a result, the trained model with GF-TF features and cross-speech dataset produced promising results, while the proposed target mask scored higher on the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) tests. On top of that, a modified Harmonic Regeneration Noise Reduction (HRNR) algorithm is proposed as a post-filtering strategy to enhance speech signal due to residual noise being introduced after DNN prediction. Results from TIMIT dataset revealed that average STOI scores for the joint algorithm are higher than those of DNN, conventional HRNR and Log Minimum Mean Square Error (Log-MMSE) algorithms. With SNR of -5 dB, an improvement of 4% over DNN algorithm, 36% over conventional HRNR algorithm, and 12% over Log-MMSE algorithm are obtained. While the average PESQ score is less affected after post-filtering strategy. Thus, this work has contributed to improve speech intelligibility from noisy backgrounds at low SNR as it can be deployed in speechenabled mobile applications. 2022-01 Thesis http://eprints.uthm.edu.my/8464/ http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf text en public http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf text en staffonly http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf text en validuser phd doctoral Universiti Tun Hussein Onn Malaysia Fakulti Kejuruteraan Elektrik dan Elektronik
institution Universiti Tun Hussein Onn Malaysia
collection UTHM Institutional Repository
language English
English
English
topic TK5101-6720 Telecommunication
Including telegraphy, telephone, radio, radar, television
spellingShingle TK5101-6720 Telecommunication
Including telegraphy, telephone, radio, radar, television
Md Jamal, Norezmi
Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
description The development of speech-enabled mobile applications has greatly improved human-computer interaction in recent years. These applications are flexible and convenient for users. Since the speech signal is captured in mobile conditions, it may easily be contaminated by background noises, which may result in a complicated computation and require speech enhancement algorithm. Thus, the performance of speech applications can be degraded when signal-to-noise ratio (SNR) is low and nonstationary noise is present. Moreover, the task of removing noises without causing speech distortion is also challenging, in which the quality and intelligibility of speech are affected. In order to overcome these issues, a supervised Deep Neural Network (DNN) algorithm predicted constrained Wiener Filter (cWF) target mask algorithm based on extracted Gammatone filter bank power spectrum (GF-TF) features and trained model is developed. As a result, the trained model with GF-TF features and cross-speech dataset produced promising results, while the proposed target mask scored higher on the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) tests. On top of that, a modified Harmonic Regeneration Noise Reduction (HRNR) algorithm is proposed as a post-filtering strategy to enhance speech signal due to residual noise being introduced after DNN prediction. Results from TIMIT dataset revealed that average STOI scores for the joint algorithm are higher than those of DNN, conventional HRNR and Log Minimum Mean Square Error (Log-MMSE) algorithms. With SNR of -5 dB, an improvement of 4% over DNN algorithm, 36% over conventional HRNR algorithm, and 12% over Log-MMSE algorithm are obtained. While the average PESQ score is less affected after post-filtering strategy. Thus, this work has contributed to improve speech intelligibility from noisy backgrounds at low SNR as it can be deployed in speechenabled mobile applications.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Md Jamal, Norezmi
author_facet Md Jamal, Norezmi
author_sort Md Jamal, Norezmi
title Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
title_short Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
title_full Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
title_fullStr Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
title_full_unstemmed Speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
title_sort speech enhancement using deep neural network based on mask estimation and harmonic regeneration noise reduction for single channel microphone
granting_institution Universiti Tun Hussein Onn Malaysia
granting_department Fakulti Kejuruteraan Elektrik dan Elektronik
publishDate 2022
url http://eprints.uthm.edu.my/8464/1/24p%20NOREZMI%20MD%20JAMAL.pdf
http://eprints.uthm.edu.my/8464/2/NOREZMI%20MD%20JAMAL%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/8464/3/NOREZMI%20MD%20JAMAL%20WATERMARK.pdf
_version_ 1776103352040423424