Acoustic event detection with binarized neural network

Implementation of deep learning for Acoustic Event Detection (AED) on embedded systems is challenging due to constraints on memory, computational resources and, power dissipation. Various solutions to overcome this limitation have been proposed. One of the latest methods to overcome this limitation...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Wong, Kah Liang
التنسيق: أطروحة
اللغة:English
منشور في: 2020
الموضوعات:
الوصول للمادة أونلاين:http://eprints.utm.my/id/eprint/93005/1/WongKahLiangMSKE2020.pdf
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
id my-utm-ep.93005
record_format uketd_dc
spelling my-utm-ep.930052021-11-07T06:00:22Z Acoustic event detection with binarized neural network 2020 Wong, Kah Liang TK Electrical engineering. Electronics Nuclear engineering Implementation of deep learning for Acoustic Event Detection (AED) on embedded systems is challenging due to constraints on memory, computational resources and, power dissipation. Various solutions to overcome this limitation have been proposed. One of the latest methods to overcome this limitation is by using Binarized Neural Network (BNN) which has been proven to achieve approximately 32x memory savings and 58x lower computational resources. XNOR-Net is a type of BNN which uses the XNOR gate to perform a logical function on the input data and give all outputs in binary form. In this project, the XNOR-Net model is constructed and trained for the AED task using urban sound (UrbanSound8K) and bird sound (Xeno-Canto) datasets. Prior to performing the training, the datasets were pre-processed through audio segmentation to produce 1-second sound files. Each audio file is converted from the time domain to Mel-Spectrogram in the frequency domain and thresholding was implemented to convert each spectrogram into a binary image. The images are then reshaped to 32x32 pixels before being used for the training procedure. A performance comparison between BinaryNet and XNOR-Net in terms of the number of hidden layers used was performed and one binary convolutional layer structure XNOR-Net was determined and constructed. The block structure and hyperparameters of the XNOR-Net were analyzed and optimized to achieve a training accuracy of 96.06% and validation accuracy of 94.08%. 2020 Thesis http://eprints.utm.my/id/eprint/93005/ http://eprints.utm.my/id/eprint/93005/1/WongKahLiangMSKE2020.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:135894 masters Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering Faculty of Engineering - School of Electrical Engineering
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Wong, Kah Liang
Acoustic event detection with binarized neural network
description Implementation of deep learning for Acoustic Event Detection (AED) on embedded systems is challenging due to constraints on memory, computational resources and, power dissipation. Various solutions to overcome this limitation have been proposed. One of the latest methods to overcome this limitation is by using Binarized Neural Network (BNN) which has been proven to achieve approximately 32x memory savings and 58x lower computational resources. XNOR-Net is a type of BNN which uses the XNOR gate to perform a logical function on the input data and give all outputs in binary form. In this project, the XNOR-Net model is constructed and trained for the AED task using urban sound (UrbanSound8K) and bird sound (Xeno-Canto) datasets. Prior to performing the training, the datasets were pre-processed through audio segmentation to produce 1-second sound files. Each audio file is converted from the time domain to Mel-Spectrogram in the frequency domain and thresholding was implemented to convert each spectrogram into a binary image. The images are then reshaped to 32x32 pixels before being used for the training procedure. A performance comparison between BinaryNet and XNOR-Net in terms of the number of hidden layers used was performed and one binary convolutional layer structure XNOR-Net was determined and constructed. The block structure and hyperparameters of the XNOR-Net were analyzed and optimized to achieve a training accuracy of 96.06% and validation accuracy of 94.08%.
format Thesis
qualification_level Master's degree
author Wong, Kah Liang
author_facet Wong, Kah Liang
author_sort Wong, Kah Liang
title Acoustic event detection with binarized neural network
title_short Acoustic event detection with binarized neural network
title_full Acoustic event detection with binarized neural network
title_fullStr Acoustic event detection with binarized neural network
title_full_unstemmed Acoustic event detection with binarized neural network
title_sort acoustic event detection with binarized neural network
granting_institution Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering
granting_department Faculty of Engineering - School of Electrical Engineering
publishDate 2020
url http://eprints.utm.my/id/eprint/93005/1/WongKahLiangMSKE2020.pdf
_version_ 1747818624786104320