Stream processor architecture – streaming memory system

Streaming memory system in this project is defined as a process of an stream processor that need to be able to stream whole chunk of data from/to external memory with real time performance. Real-time implementation of Convolution Neural Network (CNN) application are taking large amount of CPU cycle...

Full description

Saved in:
Bibliographic Details
Main Author: Ngo, Wai Loon
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/54662/1/NgoWaiLoonMFKE2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Streaming memory system in this project is defined as a process of an stream processor that need to be able to stream whole chunk of data from/to external memory with real time performance. Real-time implementation of Convolution Neural Network (CNN) application are taking large amount of CPU cycle to fetch and write data from/to external memory are popular issues in the field of media processing. So, a new FPGA-based streaming memory system was designed in order to allow the media processing application to execute multiple streaming operators. It would be a challenging task as computational capability would be increased. The hierarchy of streaming memory system consists of a memory system, stream register file (SRF) and arithmetic units. The memory system and SRF are the key part of the functional architecture of this project. This hierarchy can be used to exploit the parallelism and locality of streaming media applications. The main idea of three storage hierarchy is to allow the ALUs to operate efficiently in parallel. It is impractical to provide data on every cycle to ALU clusters using off-chip DRAM because peak bandwidth of memory system could not effectively support ALU clusters to achieve the computation rate. Moreover, the reason of creating streaming memory system is to fulfill the computational capability in media application that consists of multiple arithmetic operator. Two study case was tested which is RGB to YUV Cluster and 2D convolution cluster, and it was successfully to read or write a particular chunk of data in real time.