Indexing strategies of MapReduce for information retrieval in big data

In Information Retrieval (IR) the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the i...

Full description

Saved in:
Bibliographic Details
Main Author: Ramadhan, Mazen Farid Ebrahim
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-upm-ir.66723
record_format uketd_dc
spelling my-upm-ir.667232019-01-31T02:28:06Z Indexing strategies of MapReduce for information retrieval in big data 2016-01 Ramadhan, Mazen Farid Ebrahim In Information Retrieval (IR) the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the interoperability of platforms. Overall multiple processing machines MapReduce has been suggested as a suitable platform that use for distributing the intensive data operations. In this project, Sensei and Per-posting list indexing, Terrier will be analysed as they are the two most efficient MapReduce indexing strategies. The two indexing will be implemented in an existing framework of IR, and an experiment will be performed by using the Hadoop for MapReducing with the same large dataset, and try to analyse and verify the better efficient strategy between Sensei and Terrier. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing strategies scaled and work with large size of dataset and distributed number of different machines. The throughput will be measured by using MB/S (megabyte/per second), and the experiment results analyzing the performance of delay, consuming time and efficiency of indexing strategies between Sensei and Per-posting list indexing ,Terrier. Big data Information retrieval MapReduce (Computer file) 2016-01 Thesis http://psasir.upm.edu.my/id/eprint/66723/ http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf text en public masters Universiti Putra Malaysia Big data Information retrieval MapReduce (Computer file)
institution Universiti Putra Malaysia
collection PSAS Institutional Repository
language English
topic Big data
Information retrieval
MapReduce (Computer file)
spellingShingle Big data
Information retrieval
MapReduce (Computer file)
Ramadhan, Mazen Farid Ebrahim
Indexing strategies of MapReduce for information retrieval in big data
description In Information Retrieval (IR) the efficient strategy of indexing large dataset and terabyte-scale data is still an issue because of information overload as the result of increasing the knowledge, increasing the number of different media, increasing the number of platforms, and increasing the interoperability of platforms. Overall multiple processing machines MapReduce has been suggested as a suitable platform that use for distributing the intensive data operations. In this project, Sensei and Per-posting list indexing, Terrier will be analysed as they are the two most efficient MapReduce indexing strategies. The two indexing will be implemented in an existing framework of IR, and an experiment will be performed by using the Hadoop for MapReducing with the same large dataset, and try to analyse and verify the better efficient strategy between Sensei and Terrier. The experiment will measure the performance of retrieving when the size and processing power enlarge. The experiment examines how the indexing strategies scaled and work with large size of dataset and distributed number of different machines. The throughput will be measured by using MB/S (megabyte/per second), and the experiment results analyzing the performance of delay, consuming time and efficiency of indexing strategies between Sensei and Per-posting list indexing ,Terrier.
format Thesis
qualification_level Master's degree
author Ramadhan, Mazen Farid Ebrahim
author_facet Ramadhan, Mazen Farid Ebrahim
author_sort Ramadhan, Mazen Farid Ebrahim
title Indexing strategies of MapReduce for information retrieval in big data
title_short Indexing strategies of MapReduce for information retrieval in big data
title_full Indexing strategies of MapReduce for information retrieval in big data
title_fullStr Indexing strategies of MapReduce for information retrieval in big data
title_full_unstemmed Indexing strategies of MapReduce for information retrieval in big data
title_sort indexing strategies of mapreduce for information retrieval in big data
granting_institution Universiti Putra Malaysia
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/66723/1/FSKTM%202016%2025%20IR.pdf
_version_ 1747812399115665408