An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of an...

Full description

Saved in:

Bibliographic Details
Main Author:	Saufi, Bukhari
Format:	Thesis
Language:	eng eng eng
Published:	2020
Subjects:	QA75 Electronic computers Computer science
Online Access:	https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf https://etd.uum.edu.my/8715/2/s900382_01.pdf https://etd.uum.edu.my/8715/3/s900382_references.docx
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-uum-etd.8715
record_format	uketd_dc
spelling	my-uum-etd.87152021-10-07T05:51:34Z An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing 2020 Saufi, Bukhari Ku Mahamud, Ku Ruhana Morino, Hiroaki Awang Had Salleh Graduate School of Arts & Sciences Awang Had Salleh Graduate School of Arts & Sciences QA75 Electronic computers. Computer science Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults. 2020 Thesis https://etd.uum.edu.my/8715/ https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf text eng staffonly https://etd.uum.edu.my/8715/2/s900382_01.pdf text eng public https://etd.uum.edu.my/8715/3/s900382_references.docx text eng public other doctoral Universiti Utara Malaysia
institution	Universiti Utara Malaysia
collection	UUM ETD
language	eng eng eng
advisor	Ku Mahamud, Ku Ruhana Morino, Hiroaki
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Saufi, Bukhari An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
description	Fault tolerance in grid computing allows the system to continue operate despite occurrence of failure. Most fault tolerance algorithms focus on fault handling techniques such as task reprocessing, checkpointing, task replication, penalty, and task migration. Ant colony system (ACS), a variant of ant colony optimization (ACO), is one of the promising algorithms for fault tolerance due to its ability to adapt to both static and dynamic combinatorial optimization problems. However, ACS algorithm does not consider the resource fitness during task scheduling which leads to poor load balancing and lower execution success rate. This research proposes dynamic ACS fault tolerance with suspension (DAFTS) in grid computing that focuses on providing effective fault tolerance techniques to improve the execution success rate and load balancing. The proposed algorithm consists of dynamic evaporation rate, resource fitness-based scheduling process, enhanced pheromone update with trust factor and suspension, and checkpoint-based task reprocessing. The research framework consists of four phases which are identifying fault tolerance techniques, enhancing resource assignment and job scheduling, improving fault tolerance algorithm and, evaluating the performance of the proposed algorithm. The proposed algorithm was developed in a simulated grid environment called GridSim and evaluated against other fault tolerance algorithms such as trust-based ACO, fault tolerance ACO, ACO without fault tolerance and ACO with fault tolerance in terms of total execution time, average latency, average makespan, throughput, execution success rate and load balancing. Experimental results showed that the proposed algorithm achieved the best performance in most aspects, and second best in terms of load balancing. The DAFTS achieved the smallest increase on execution time, average makespan and average latency by 7%, 11% and 5% respectively, and smallest decrease on throughput and execution success rate by 6.49% and 9% respectively as the failure rate increases. The DAFTS also achieved the smallest increment on execution time, average makespan and average latency by 5.8, 8.5 and 8.7 times respectively, and highest increase on throughput and highest execution success rate by 72.9% and 93.7% respectively as the number of jobs increases. The proposed algorithm can effectively overcome load balancing problems and increase execution success rates in distributed systems that are prone to faults.
format	Thesis
qualification_name	other
qualification_level	Doctorate
author	Saufi, Bukhari
author_facet	Saufi, Bukhari
author_sort	Saufi, Bukhari
title	An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_short	An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full	An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_fullStr	An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_full_unstemmed	An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
title_sort	enhanced ant colony system algorithm for dynamic fault tolerance in grid computing
granting_institution	Universiti Utara Malaysia
granting_department	Awang Had Salleh Graduate School of Arts & Sciences
publishDate	2020
url	https://etd.uum.edu.my/8715/1/Deposit%20Permission_s900382.pdf https://etd.uum.edu.my/8715/2/s900382_01.pdf https://etd.uum.edu.my/8715/3/s900382_references.docx
_version_	1747828445980655616

An enhanced ant colony system algorithm for dynamic fault tolerance in grid computing

Similar Items