Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce

Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about wh...

Full description

Saved in:
Bibliographic Details
Main Author: Busu, Norzaharawani
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.67852
record_format uketd_dc
spelling my-upm-ir.678522019-03-28T07:07:39Z Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce 2017-01 Busu, Norzaharawani Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment analysis or also known as opinion mining. Continuous structured and unstructured data are being generated in a large scale every day. These data are meaningless if they are not being captured and analyzed accordingly. Traditional RDBMS technology becomes less reliable when dealing with huge amount of structured data and the processing speed of data becomes sluggish if the infrastructure is not being upgraded to match the big amount of data. Furthermore, RDBMS is not capable to deal with unstructured data. Due to petabytes of records are generated every year on the net, capturing and analyzing big data can be challenging and cloud computing technologies are able to provide an on-demand infrastructures and services based on user requirements. Therefore, this thesis aims to use cloud based infrastructure which is Amazon Web Service to capture unstructured of big data, and afterward analyzing, visualizing and extracting useful information from large, diverse, distributed and mixed of data gathered from public data sets and Twitter’s Application Programming Interface (API). The results and explanation on the experiments mentioned in the chapter four; show the test bed result on collecting twitter data, test bed result on processing twitter input data and test bed result on output data. The analysis emphasizes on the elapsed time when collecting twitter data and also the performance of Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon Web Service are proficient enough to captured and manipulated large volume of unstructured big data on twitter. Afterward, this study have tested the capability of Amazon Elastic MapReduce (EMR) to process the input twitter data that had collected earlier, and transform them into a meaningful output that can be used for any decision making. Cloud computing - Data processing Big data 2017-01 Thesis http://psasir.upm.edu.my/id/eprint/67852/ http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf text en public masters Universiti Putra Malaysia Cloud computing - Data processing Big data
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Cloud computing - Data processing
Big data

spellingShingle Cloud computing - Data processing
Big data

Busu, Norzaharawani
Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
description Nowadays, growing expansion of data content on the web delivers a huge amount of collective resources. Twitter, one of the biggest social media site collects tweets in millions every day in the range of Petabyte per year. Societies share their experiences, thoughts or simply talk just about whatever concerns them online. Unstructured big data in social media plays vital roles in sentiment analysis or also known as opinion mining. Continuous structured and unstructured data are being generated in a large scale every day. These data are meaningless if they are not being captured and analyzed accordingly. Traditional RDBMS technology becomes less reliable when dealing with huge amount of structured data and the processing speed of data becomes sluggish if the infrastructure is not being upgraded to match the big amount of data. Furthermore, RDBMS is not capable to deal with unstructured data. Due to petabytes of records are generated every year on the net, capturing and analyzing big data can be challenging and cloud computing technologies are able to provide an on-demand infrastructures and services based on user requirements. Therefore, this thesis aims to use cloud based infrastructure which is Amazon Web Service to capture unstructured of big data, and afterward analyzing, visualizing and extracting useful information from large, diverse, distributed and mixed of data gathered from public data sets and Twitter’s Application Programming Interface (API). The results and explanation on the experiments mentioned in the chapter four; show the test bed result on collecting twitter data, test bed result on processing twitter input data and test bed result on output data. The analysis emphasizes on the elapsed time when collecting twitter data and also the performance of Amazon Elastic MapReduce (EMR). The infrastructures provided by Amazon Web Service are proficient enough to captured and manipulated large volume of unstructured big data on twitter. Afterward, this study have tested the capability of Amazon Elastic MapReduce (EMR) to process the input twitter data that had collected earlier, and transform them into a meaningful output that can be used for any decision making.
format Thesis
qualification_level Master's degree
author Busu, Norzaharawani
author_facet Busu, Norzaharawani
author_sort Busu, Norzaharawani
title Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_short Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_full Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_fullStr Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_full_unstemmed Unstructured big data processing in cloud computing environment by using Amazon Elastic Map Reduce
title_sort unstructured big data processing in cloud computing environment by using amazon elastic map reduce
granting_institution Universiti Putra Malaysia
publishDate 2017
url http://psasir.upm.edu.my/id/eprint/67852/1/FSKTM%202017%2024%20IR.pdf
_version_ 1747812526471512064