Improved field programmable gatearraybased accelerator of deep neural networkusing opencl

Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and...

Full description

Saved in:

Bibliographic Details
Main Author:	Yap, June Wai
Format:	Thesis
Language:	English English
Published:	2022
Online Access:	http://eprints.utem.edu.my/id/eprint/26977/1/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf http://eprints.utem.edu.my/id/eprint/26977/2/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utem-ep.26977
record_format	uketd_dc
spelling	my-utem-ep.269772024-01-16T14:28:16Z Improved field programmable gatearraybased accelerator of deep neural networkusing opencl 2022 Yap, June Wai Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and VGG, there is still a lot of challenges to implement DNN-based object detection model on Field Programmable Gate Array (FPGA). Hence, in this research, the design of a scalable parameterised DNN-based object detection model: Tiny YOLOv2 targeting on FPGA: Cyclone V PCIE Development Kit using High-Level-Synthesis (HLS) tool is explored. Considering the hardware resource limitations in term of computational resources and memory bandwidth, data quantization is proposed to convert the floating point (32-bit) of Tiny YOLOv2 into fixed-point (8-bit) design. To achieve the good performance, an in-depth analysis on the computation complexity and memory footprint of the Tiny YOLOv2 is also studied to find the best quantization scheme for Tiny YOLOv2. The proposed quantization scheme improves the memory requirements to store the parameter from 60 MB to 15 MB, which is around ×4 times improvement compared to the original floating-point design. Finally, the proposed implementation achieves a peak performance density of 0.29 Giga-Operation Per Second (GOPS)/Digital Signal Processing Block (DSP) with only 0.4% loss in the accuracy, which the performance is comparable to all other previous works. 2022 Thesis http://eprints.utem.edu.my/id/eprint/26977/ http://eprints.utem.edu.my/id/eprint/26977/1/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf text en public http://eprints.utem.edu.my/id/eprint/26977/2/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122220 mphil masters Universiti Teknikal Malaysia Melaka Faculty of Electronic and Computer Engineering Mohd Yusof, Zulkalnain
institution	Universiti Teknikal Malaysia Melaka
collection	UTeM Repository
language	English English
advisor	Mohd Yusof, Zulkalnain
description	Being compute-intensive and memory expensive, it is hard to deploy Deep Neural Network (DNN) based models into the embedded devices. Despite recent studies that have shown the efforts to explore the Field Programmable Gate Array (FPGA) as an alternative to deploy DNN-based models such as AlexNet and VGG, there is still a lot of challenges to implement DNN-based object detection model on Field Programmable Gate Array (FPGA). Hence, in this research, the design of a scalable parameterised DNN-based object detection model: Tiny YOLOv2 targeting on FPGA: Cyclone V PCIE Development Kit using High-Level-Synthesis (HLS) tool is explored. Considering the hardware resource limitations in term of computational resources and memory bandwidth, data quantization is proposed to convert the floating point (32-bit) of Tiny YOLOv2 into fixed-point (8-bit) design. To achieve the good performance, an in-depth analysis on the computation complexity and memory footprint of the Tiny YOLOv2 is also studied to find the best quantization scheme for Tiny YOLOv2. The proposed quantization scheme improves the memory requirements to store the parameter from 60 MB to 15 MB, which is around ×4 times improvement compared to the original floating-point design. Finally, the proposed implementation achieves a peak performance density of 0.29 Giga-Operation Per Second (GOPS)/Digital Signal Processing Block (DSP) with only 0.4% loss in the accuracy, which the performance is comparable to all other previous works.
format	Thesis
qualification_name	Master of Philosophy (M.Phil.)
qualification_level	Master's degree
author	Yap, June Wai
spellingShingle	Yap, June Wai Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
author_facet	Yap, June Wai
author_sort	Yap, June Wai
title	Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_short	Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_full	Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_fullStr	Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_full_unstemmed	Improved field programmable gatearraybased accelerator of deep neural networkusing opencl
title_sort	improved field programmable gatearraybased accelerator of deep neural networkusing opencl
granting_institution	Universiti Teknikal Malaysia Melaka
granting_department	Faculty of Electronic and Computer Engineering
publishDate	2022
url	http://eprints.utem.edu.my/id/eprint/26977/1/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf http://eprints.utem.edu.my/id/eprint/26977/2/Improved%20field%20programmable%20gatearraybased%20accelerator%20of%20deep%20neural%20networkusing%20opencl.pdf
_version_	1794023197749805056

Improved field programmable gatearraybased accelerator of deep neural networkusing opencl

Similar Items