Low latency fast data computation scheme for map reduce based clusters

MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with r...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Shabbir, Aisha
التنسيق:	أطروحة
اللغة:	English
منشور في:	2020
الموضوعات:	QA75 Electronic computers Computer science
الوصول للمادة أونلاين:	http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-utm-ep.98237
record_format	uketd_dc
spelling	my-utm-ep.982372022-11-23T08:06:47Z Low latency fast data computation scheme for map reduce based clusters 2020 Shabbir, Aisha QA75 Electronic computers. Computer science MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters. 2020 Thesis http://eprints.utm.my/id/eprint/98237/ http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143970 phd doctoral Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing Faculty of Engineering - School of Computing
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Shabbir, Aisha Low latency fast data computation scheme for map reduce based clusters
description	MapReduce based clusters is an emerging paradigm for big data analytics to scale up and speed up the big data classification, investigation, and processing of the huge volumes, massive and complex data sets. One of the fundamental issues of processing the data in MapReduce clusters is to deal with resource heterogeneity, especially when there is data inter-dependency among the tasks. Secondly, MapReduce runs a job in many phases; the intermediate data traffic and its migration time become a major bottleneck for the computation of jobs which produces a huge intermediate data in the shuffle phase. Further, encountering factors to monitor the critical issue of straggling is necessary because it produces unnecessary delays and poses a serious constraint on the overall performance of the system. Thus, this research aims to provide a low latency fast data computation scheme which introduces three algorithms to handle interdependent task computation among heterogeneous resources, reducing intermediate data traffic with its migration time and monitoring and modelling job straggling factors. This research has developed a Low Latency and Computational Cost based Tasks Scheduling (LLCC-TS) algorithm of interdependent tasks on heterogeneous resources by encountering priority to provide cost-effective resource utilization and reduced makespan. Furthermore, an Aggregation and Partition based Accelerated Intermediate Data Migration (APAIDM) algorithm has been presented to reduce the intermediate data traffic and data migration time in the shuffle phase by using aggregators and custom partitioner. Moreover, MapReduce Total Execution Time Prediction (MTETP) scheme for MapReduce job computation with inclusion of the factors which affect the job computation time has been produced using machine learning technique (linear regression) in order to monitor the job straggling and minimize the latency. LLCCTS algorithm has 66.13%, 22.23%, 43.53%, and 44.74% performance improvement rate over FIFO, improved max-min, SJF and MOS algorithms respectively for makespan time of scheduling of interdependent tasks. The AP-AIDM algorithm scored 66.62% and 48.4% performance improvements in reducing the data migration time over hash basic and conventional aggregation algorithms, respectively. Moreover, an MTETP technique shows the performance improvement in predicting the total job execution time with 20.42% accuracy than the improved HP technique. Thus, the combination of the three algorithms mentioned above provides a low latency fast data computation scheme for MapReduce based clusters.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Shabbir, Aisha
author_facet	Shabbir, Aisha
author_sort	Shabbir, Aisha
title	Low latency fast data computation scheme for map reduce based clusters
title_short	Low latency fast data computation scheme for map reduce based clusters
title_full	Low latency fast data computation scheme for map reduce based clusters
title_fullStr	Low latency fast data computation scheme for map reduce based clusters
title_full_unstemmed	Low latency fast data computation scheme for map reduce based clusters
title_sort	low latency fast data computation scheme for map reduce based clusters
granting_institution	Universiti Teknologi Malaysia, Faculty of Engineering - School of Computing
granting_department	Faculty of Engineering - School of Computing
publishDate	2020
url	http://eprints.utm.my/id/eprint/98237/1/AishaShabbirPSC2020.pdf
_version_	1776100563099844608

Low latency fast data computation scheme for map reduce based clusters

مواد مشابهة