Low latency fast data computation scheme for map reduce based clusters

MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with r...

Full description

Saved in:
Bibliographic Details
Main Author: Shabbir, Aisha
Format: Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.98237
record_format uketd_dc
spelling my-utm-ep.982372022-11-23T08:06:47Z Low latency fast data computation scheme for map reduce based clusters 2020 Shabbir, Aisha QA75 Electronic computers. Computer science MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters. 2020 Thesis http://eprints.utm.my/id/eprint/98237/ http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143970 phd doctoral Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing Faculty of Engineering - School of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Shabbir, Aisha
Low latency fast data computation scheme for map reduce based clusters
description MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Shabbir, Aisha
author_facet Shabbir, Aisha
author_sort Shabbir, Aisha
title Low latency fast data computation scheme for map reduce based clusters
title_short Low latency fast data computation scheme for map reduce based clusters
title_full Low latency fast data computation scheme for map reduce based clusters
title_fullStr Low latency fast data computation scheme for map reduce based clusters
title_full_unstemmed Low latency fast data computation scheme for map reduce based clusters
title_sort low latency fast data computation scheme for map reduce based clusters
granting_institution Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing
granting_department Faculty of Engineering - School of Computing
publishDate 2020
url http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf
_version_ 1776100563099844608