Adaptive online fault detection on network-on-chip based on packet logging mechanism

The shrinking size of transistors and on-chip interconnects contribute to increasing probability of on-chip faults. Fault tolerance is one of the key features in Network-on-Chip (NoC) architecture. Current NoCs use Error Detection and Correction (EDC) and acknowledgement mechanisms for fault and err...

Full description

Saved in:
Bibliographic Details
Main Author: Loo, Ling Kim
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/54603/1/LooLingKimMFKE2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The shrinking size of transistors and on-chip interconnects contribute to increasing probability of on-chip faults. Fault tolerance is one of the key features in Network-on-Chip (NoC) architecture. Current NoCs use Error Detection and Correction (EDC) and acknowledgement mechanisms for fault and error controls. In order to maintain system functionality in presence of the faults, adapting error detection and correction based on changing error probability is required. Adapting fault detection techniques based on error probability helps NoC to achieve improved fault tolerance. End-to-end (E2E) EDC works better at low error probability whereas switch-to-switch (S2S) works better at high error probability condition. This thesis proposes an adaptive fault detection and fault diagnosis based on Negative acknowledgement (NACK) logging mechanism. In the first part, this thesis proposes a PL-Adaptive method where NoC routers are able to switch between E2E and S2S EDC depending on changing error probability. Each router tracks transmitted packets and NACK packets to continuously monitor its fault level. In the second part, this thesis proposes fault type classification of router and link faults. Based on experimental results by using constant uniform traffic pattern, our proposed PL Adaptive method gives better average latency than using only E2E or S2S. By evaluating the transmission latency with single error in a single path, our proposed PL-Adaptive method is able to achieve latency reduction in the range of [13% - 50%] compared to only S2S or E2E mechanism. Moreover, based on smaller decay rate and error probability in the range of [5x10-5-10-1], smaller threshold increases the higher probability to detect fault and error. PL-Adaptive method is able to detect faults and error up to 96%. Besides, our proposed PL-Adaptive method allows NoC routers to adapt with dynamic packet error probability and can identify router and link faults.