Field programmable gate array based convolution neural network hardware accelerator with optimized memory controller

Convolution Neural Network (CNN) is a special kind of neural network that is inspired by the behaviour of optic nerves in living creatures. CNN is gaining more and more attention nowadays because of the increased demand for high speed and lowcost synthetic vision systems. However, CNN can be both co...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed, Mohammed Isam Eldin Hassan
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92997/1/MohammedIsamEldinMSKE2020.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Convolution Neural Network (CNN) is a special kind of neural network that is inspired by the behaviour of optic nerves in living creatures. CNN is gaining more and more attention nowadays because of the increased demand for high speed and lowcost synthetic vision systems. However, CNN can be both compute- and memoryintensive. For that reason, implementation in a general-purpose processor will be slow and inefficient. Therefore, this project proposes a flexible CNN hardware accelerator that targets the Field Programmable Gate Array (FPGA) platform and features an optimized memory controller to reduce redundancy memory access. The main advantage of this project is that the accelerator is flexible - meaning that the user of the accelerator has the capability of modifying the architecture using parameterization to optimize for execution speed, resource utilization, and power consumption. The accelerator employs various hardware design techniques like loop unrolling, pipelining, optimized memory controller, and others to achieve the targeted performance. The accelerator is written in System Verilog language using Xilinx’s Vivado software and is tested using a single convolution layer from several selected CNN architectures. Then, it is compared against the same convolution layer implemented in Matlab. The proposed accelerator shows a huge speedup compared to the software counterpart of up to 4251X speed up with reasonable resource utilization and consumes only 0.27 W per layer.