1-D and 2-D convolution neural network for bird sound detection
This research aimed to determine the most suitable audio input format to the Convolution Neural Network (CNN) model, to train a bird activity detector that is low in memory usage with decent accuracy. To enable this investigation, three types of CNN were developed, including one 1-D CNN and two arch...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/92992/1/TeeYunHongMSKE2020.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utm-ep.92992 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.929922021-11-07T06:00:15Z 1-D and 2-D convolution neural network for bird sound detection 2020 Tee, Yun Hong TK Electrical engineering. Electronics Nuclear engineering This research aimed to determine the most suitable audio input format to the Convolution Neural Network (CNN) model, to train a bird activity detector that is low in memory usage with decent accuracy. To enable this investigation, three types of CNN were developed, including one 1-D CNN and two architecturally identical 2-D CNNs that used two different input. 1-D CNN used wav as input, while these two 2-D CNNs used wav image and spectrogram image as input respectively. Accuracy, model size, and training time were used to determine the best model among these three types of CNN. Bird audio and Urban8k audio were used as positive datasets and negative dataset respectively. For each type of CNN model, the most suitable convolution filter size was determined first, before proceeding to determine the best model out of three models of different number of convolution layer. There was one winner for 1-D CNN, 2-D CNN using a wav image and 2-D CNN using a spectrogram image. These three winners were then being compared to determine the overall best model for bird activity detector. For this research, the overall best model was five layers 2-D CNN using a spectrogram image of filter size 5×5. The accuracy achieved was 97.12%, the model size was 6MB, and the training time was fourteen minutes. The additional arithmetic operations required in converting wav to spectrogram was deemed acceptable due to much better accuracy achieved. Spectrogram image was the most suitable audio input format to CNN to train a bird activity detector that is low in memory usage with decent accuracy. 2020 Thesis http://eprints.utm.my/id/eprint/92992/ http://eprints.utm.my/id/eprint/92992/1/TeeYunHongMSKE2020.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:135875 masters Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering Faculty of Engineering - School of Electrical Engineering |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
TK Electrical engineering Electronics Nuclear engineering |
spellingShingle |
TK Electrical engineering Electronics Nuclear engineering Tee, Yun Hong 1-D and 2-D convolution neural network for bird sound detection |
description |
This research aimed to determine the most suitable audio input format to the Convolution Neural Network (CNN) model, to train a bird activity detector that is low in memory usage with decent accuracy. To enable this investigation, three types of CNN were developed, including one 1-D CNN and two architecturally identical 2-D CNNs that used two different input. 1-D CNN used wav as input, while these two 2-D CNNs used wav image and spectrogram image as input respectively. Accuracy, model size, and training time were used to determine the best model among these three types of CNN. Bird audio and Urban8k audio were used as positive datasets and negative dataset respectively. For each type of CNN model, the most suitable convolution filter size was determined first, before proceeding to determine the best model out of three models of different number of convolution layer. There was one winner for 1-D CNN, 2-D CNN using a wav image and 2-D CNN using a spectrogram image. These three winners were then being compared to determine the overall best model for bird activity detector. For this research, the overall best model was five layers 2-D CNN using a spectrogram image of filter size 5×5. The accuracy achieved was 97.12%, the model size was 6MB, and the training time was fourteen minutes. The additional arithmetic operations required in converting wav to spectrogram was deemed acceptable due to much better accuracy achieved. Spectrogram image was the most suitable audio input format to CNN to train a bird activity detector that is low in memory usage with decent accuracy. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Tee, Yun Hong |
author_facet |
Tee, Yun Hong |
author_sort |
Tee, Yun Hong |
title |
1-D and 2-D convolution neural network for bird sound detection |
title_short |
1-D and 2-D convolution neural network for bird sound detection |
title_full |
1-D and 2-D convolution neural network for bird sound detection |
title_fullStr |
1-D and 2-D convolution neural network for bird sound detection |
title_full_unstemmed |
1-D and 2-D convolution neural network for bird sound detection |
title_sort |
1-d and 2-d convolution neural network for bird sound detection |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering |
granting_department |
Faculty of Engineering - School of Electrical Engineering |
publishDate |
2020 |
url |
http://eprints.utm.my/id/eprint/92992/1/TeeYunHongMSKE2020.pdf |
_version_ |
1747818621603676160 |