An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of an...

Full description

Saved in:
Bibliographic Details
Main Author: Saufi, Bukhari
Format: Thesis
Language:eng
eng
eng
Published: 2020
Subjects:
Online Access:https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf
https://etd.uum.edu.my/8715/2/s900382_01.pdf
https://etd.uum.edu.my/8715/3/s900382_references.docx
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uum-etd.8715
record_format uketd_dc
spelling my-uum-etd.87152021-10-07T05:51:34Z An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing 2020 Saufi, Bukhari Ku Mahamud, Ku Ruhana Morino, Hiroaki Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts & Sciences QA75 Electronic computers. Computer science Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults. 2020 Thesis https://etd.uum.edu.my/8715/ https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf text eng staffonly https://etd.uum.edu.my/8715/2/s900382_01.pdf text eng public https://etd.uum.edu.my/8715/3/s900382_references.docx text eng public other doctoral Universiti Utara Malaysia
institution Universiti Utara Malaysia
collection UUM ETD
language eng
eng
eng
advisor Ku Mahamud, Ku Ruhana
Morino, Hiroaki
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Saufi, Bukhari
An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
description Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults.
format Thesis
qualification_name other
qualification_level Doctorate
author Saufi, Bukhari
author_facet Saufi, Bukhari
author_sort Saufi, Bukhari
title An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_short An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_fullStr An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full_unstemmed An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_sort enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
granting_institution Universiti Utara Malaysia
granting_department Awang Had Salleh Graduate School of Arts & Sciences
publishDate 2020
url https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf
https://etd.uum.edu.my/8715/2/s900382_01.pdf
https://etd.uum.edu.my/8715/3/s900382_references.docx
_version_ 1747828445980655616