Data depublication using : Hashing algorithm / Naimah Nayan

Data depublication is method that help reduce the redundant data in storage capacity. With the rapid growth of digital data that generated in the digital world, the capacity of storage usage will increase rapidly. To achieve deduplication efficiency in system storage, the duplicate data need to be e...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Nayan, Naimah
التنسيق:	أطروحة
اللغة:	English
منشور في:	2019
الموضوعات:	Algorithms
الوصول للمادة أونلاين:	https://ir.uitm.edu.my/id/eprint/26848/1/TD_NAIMAH%20NAYAN%20CS%20R%2019_5.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-uitm-ir.26848
record_format	uketd_dc
spelling	my-uitm-ir.268482019-12-10T07:18:52Z Data depublication using : Hashing algorithm / Naimah Nayan 2019-12-12 Nayan, Naimah Algorithms Data depublication is method that help reduce the redundant data in storage capacity. With the rapid growth of digital data that generated in the digital world, the capacity of storage usage will increase rapidly. To achieve deduplication efficiency in system storage, the duplicate data need to be eliminated. To eliminated the duplicate data, the file unique value or hash value need to compare and the files that have the same hash value will be remove. This method basically will help to improve the storage capacity and efficiency. The hash value is generated by using hashing algorithm such as Message Digest 5 (MD5) and Secure Hashing Algorithm 1 (SHA-1). The hash functions should not create the same index value for the different data. If there is lack of analysis on the hashing algorithm, the deduplication technique cannot be improved for future research and the evolution of data deduplication can be slow because the performance metric for each hashing algorithm is not clear enough. The objective of this project is to compare MD5 & SHA-1 algorithm in data deduplication technique and to evaluate the MD5 & SHA-1 algorithm, length of message digest and speed using deduplication software. The simulation was conducted using File Alyzer, Clone Files Checker and AllDup software. The result of this simulation had been analysed based on three performance metrics which is efficiency, message digest length and the speed. There were two type of dataset which is video and document files with four different sizes. The time taken of the hashing algorithm generate the hash value were recorded. The findings in this project is the MD5 speed performance is better than SHA-1 hashing algorithm because it generates the hash value faster due to the length of message digest in MD5 is shorter than SHA-1. The recommendation for future work is to evaluate various type of data and different type of hashing algorithm. 2019-12 Thesis https://ir.uitm.edu.my/id/eprint/26848/ https://ir.uitm.edu.my/id/eprint/26848/1/TD_NAIMAH%20NAYAN%20CS%20R%2019_5.pdf text en public degree Universiti Teknologi MARA, Perlis Faculty of Computer and Mathematical Sciences
institution	Universiti Teknologi MARA
collection	UiTM Institutional Repository
language	English
topic	Algorithms
spellingShingle	Algorithms Nayan, Naimah Data depublication using : Hashing algorithm / Naimah Nayan
description	Data depublication is method that help reduce the redundant data in storage capacity. With the rapid growth of digital data that generated in the digital world, the capacity of storage usage will increase rapidly. To achieve deduplication efficiency in system storage, the duplicate data need to be eliminated. To eliminated the duplicate data, the file unique value or hash value need to compare and the files that have the same hash value will be remove. This method basically will help to improve the storage capacity and efficiency. The hash value is generated by using hashing algorithm such as Message Digest 5 (MD5) and Secure Hashing Algorithm 1 (SHA-1). The hash functions should not create the same index value for the different data. If there is lack of analysis on the hashing algorithm, the deduplication technique cannot be improved for future research and the evolution of data deduplication can be slow because the performance metric for each hashing algorithm is not clear enough. The objective of this project is to compare MD5 & SHA-1 algorithm in data deduplication technique and to evaluate the MD5 & SHA-1 algorithm, length of message digest and speed using deduplication software. The simulation was conducted using File Alyzer, Clone Files Checker and AllDup software. The result of this simulation had been analysed based on three performance metrics which is efficiency, message digest length and the speed. There were two type of dataset which is video and document files with four different sizes. The time taken of the hashing algorithm generate the hash value were recorded. The findings in this project is the MD5 speed performance is better than SHA-1 hashing algorithm because it generates the hash value faster due to the length of message digest in MD5 is shorter than SHA-1. The recommendation for future work is to evaluate various type of data and different type of hashing algorithm.
format	Thesis
qualification_level	Bachelor degree
author	Nayan, Naimah
author_facet	Nayan, Naimah
author_sort	Nayan, Naimah
title	Data depublication using : Hashing algorithm / Naimah Nayan
title_short	Data depublication using : Hashing algorithm / Naimah Nayan
title_full	Data depublication using : Hashing algorithm / Naimah Nayan
title_fullStr	Data depublication using : Hashing algorithm / Naimah Nayan
title_full_unstemmed	Data depublication using : Hashing algorithm / Naimah Nayan
title_sort	data depublication using : hashing algorithm / naimah nayan
granting_institution	Universiti Teknologi MARA, Perlis
granting_department	Faculty of Computer and Mathematical Sciences
publishDate	2019
url	https://ir.uitm.edu.my/id/eprint/26848/1/TD_NAIMAH%20NAYAN%20CS%20R%2019_5.pdf
_version_	1783733910954311680

Data depublication using : Hashing algorithm / Naimah Nayan

مواد مشابهة