A cluster-based hybrid replica control protocol for high availability in data grid

Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This tec...

Full description

Saved in:
Bibliographic Details
Main Author: Mabni, Zulaile
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.84549
record_format uketd_dc
spelling my-upm-ir.845492021-12-31T08:24:30Z A cluster-based hybrid replica control protocol for high availability in data grid 2019-02 Mabni, Zulaile Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This technique provides high data availability and increases the performance of the distributed systems. In recent years, the number of distributed nodes has become very large in Grid computing system. The growing number of nodes has raised few issues in data replication. The first issue is, nodes in the Grid systems are dynamic where they can join or leave the system at any time. Therefore, a replica control protocol must consider the dynamic aspects of the Data Grid. Next important issue is replica placement which determines the suitable nodes to place the replicas. Previously, replica placement has not been an issue since the research only focuses on small-scale systems. However, in a larger system such as Data Grid, the existing replica control protocols require bigger number of replicas to construct read and write quorums. As the number of replicas increases, the communication cost also increases and thus, degrades the performance of the protocols. Another issue is replica consistency that needs to be ensured when copying data in a large-scale system. In order to maintain replica consistency, if there is concurrent update to several replicas of the same file, then all other replicas must have the same updated contents. Thus, an efficient mechanism is needed to improve performance of the system while ensuring replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system with the objectives to reduce the communication cost, increase data availability, and maintain replica consistency. CBH employs a hybrid replication strategy by combining the advantages of two common replica control protocols to improve the performance of the existing protocols. A clustering algorithm has been proposed to group the large nodes into clusters and organize these clusters into a tree structure. Another proposed algorithm is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete event simulator called GridSim and Java programming language is used to simulate the proposed protocol. The performance metrics which are communication cost and data availability of the protocol are evaluated and compared with two latest quorum-based protocols which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH shows that by grouping the nodes into clusters and having only one replica in each cluster, has minimized the number of replicas involved in constructing read and write quorums. This research has contributed a dynamic cluster-based hybrid replica control protocol which proposed a clustering algorithm to determine the number of clusters, a mechanism for dynamic participation of nodes in the network, and a replica placement algorithm that produces low communication cost and high data availability as compared to DH and DDG protocols. CBH has proven that replica consistency is maintained by satisfying the Quorum Intersection Properties. Computational grids (Computer systems) 2019-02 Thesis http://psasir.upm.edu.my/id/eprint/84549/ http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf text en public doctoral Universiti Putra Malaysia Computational grids (Computer systems) Latip, Rohaya
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
advisor Latip, Rohaya
topic Computational grids (Computer systems)


spellingShingle Computational grids (Computer systems)


Mabni, Zulaile
A cluster-based hybrid replica control protocol for high availability in data grid
description Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This technique provides high data availability and increases the performance of the distributed systems. In recent years, the number of distributed nodes has become very large in Grid computing system. The growing number of nodes has raised few issues in data replication. The first issue is, nodes in the Grid systems are dynamic where they can join or leave the system at any time. Therefore, a replica control protocol must consider the dynamic aspects of the Data Grid. Next important issue is replica placement which determines the suitable nodes to place the replicas. Previously, replica placement has not been an issue since the research only focuses on small-scale systems. However, in a larger system such as Data Grid, the existing replica control protocols require bigger number of replicas to construct read and write quorums. As the number of replicas increases, the communication cost also increases and thus, degrades the performance of the protocols. Another issue is replica consistency that needs to be ensured when copying data in a large-scale system. In order to maintain replica consistency, if there is concurrent update to several replicas of the same file, then all other replicas must have the same updated contents. Thus, an efficient mechanism is needed to improve performance of the system while ensuring replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system with the objectives to reduce the communication cost, increase data availability, and maintain replica consistency. CBH employs a hybrid replication strategy by combining the advantages of two common replica control protocols to improve the performance of the existing protocols. A clustering algorithm has been proposed to group the large nodes into clusters and organize these clusters into a tree structure. Another proposed algorithm is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete event simulator called GridSim and Java programming language is used to simulate the proposed protocol. The performance metrics which are communication cost and data availability of the protocol are evaluated and compared with two latest quorum-based protocols which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH shows that by grouping the nodes into clusters and having only one replica in each cluster, has minimized the number of replicas involved in constructing read and write quorums. This research has contributed a dynamic cluster-based hybrid replica control protocol which proposed a clustering algorithm to determine the number of clusters, a mechanism for dynamic participation of nodes in the network, and a replica placement algorithm that produces low communication cost and high data availability as compared to DH and DDG protocols. CBH has proven that replica consistency is maintained by satisfying the Quorum Intersection Properties.
format Thesis
qualification_level Doctorate
author Mabni, Zulaile
author_facet Mabni, Zulaile
author_sort Mabni, Zulaile
title A cluster-based hybrid replica control protocol for high availability in data grid
title_short A cluster-based hybrid replica control protocol for high availability in data grid
title_full A cluster-based hybrid replica control protocol for high availability in data grid
title_fullStr A cluster-based hybrid replica control protocol for high availability in data grid
title_full_unstemmed A cluster-based hybrid replica control protocol for high availability in data grid
title_sort cluster-based hybrid replica control protocol for high availability in data grid
granting_institution Universiti Putra Malaysia
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf
_version_ 1747813486982856704