A cluster-based hybrid replica control protocol for high availability in data grid

Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This tec...

Full description

Saved in:
Bibliographic Details
Main Author: Mabni, Zulaile
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/84549/1/FSKTM%20%28fsktm%29%202019%2045.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data Grid provides a scalable infrastructure for managing and storing large amount of data files in Grid computing system. In Data Grid, data replication is a widely used technique for managing data, where exact copies of data or replicas are created and stored at many distributed sites. This technique provides high data availability and increases the performance of the distributed systems. In recent years, the number of distributed nodes has become very large in Grid computing system. The growing number of nodes has raised few issues in data replication. The first issue is, nodes in the Grid systems are dynamic where they can join or leave the system at any time. Therefore, a replica control protocol must consider the dynamic aspects of the Data Grid. Next important issue is replica placement which determines the suitable nodes to place the replicas. Previously, replica placement has not been an issue since the research only focuses on small-scale systems. However, in a larger system such as Data Grid, the existing replica control protocols require bigger number of replicas to construct read and write quorums. As the number of replicas increases, the communication cost also increases and thus, degrades the performance of the protocols. Another issue is replica consistency that needs to be ensured when copying data in a large-scale system. In order to maintain replica consistency, if there is concurrent update to several replicas of the same file, then all other replicas must have the same updated contents. Thus, an efficient mechanism is needed to improve performance of the system while ensuring replica consistency in Data Grid. Therefore, in this thesis, we proposed a new replica control protocol named Cluster-Based Hybrid (CBH) protocol for large-scale system with the objectives to reduce the communication cost, increase data availability, and maintain replica consistency. CBH employs a hybrid replication strategy by combining the advantages of two common replica control protocols to improve the performance of the existing protocols. A clustering algorithm has been proposed to group the large nodes into clusters and organize these clusters into a tree structure. Another proposed algorithm is replica placement algorithm which selects and places only one replica in each cluster. The performance of CBH protocol is evaluated theoretically and using simulations. A discrete event simulator called GridSim and Java programming language is used to simulate the proposed protocol. The performance metrics which are communication cost and data availability of the protocol are evaluated and compared with two latest quorum-based protocols which are Dynamic Hybrid (DH) and Duplication on Grid (DDG) protocol. CBH shows that by grouping the nodes into clusters and having only one replica in each cluster, has minimized the number of replicas involved in constructing read and write quorums. This research has contributed a dynamic cluster-based hybrid replica control protocol which proposed a clustering algorithm to determine the number of clusters, a mechanism for dynamic participation of nodes in the network, and a replica placement algorithm that produces low communication cost and high data availability as compared to DH and DDG protocols. CBH has proven that replica consistency is maintained by satisfying the Quorum Intersection Properties.