Optimizing lossless compression by normalized data length in Huffman Algorithm

Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16...

Full description

Saved in:
Bibliographic Details
Main Author: Tonny, Hidayat
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utem-ep.26986
record_format uketd_dc
spelling my-utem-ep.269862024-01-16T14:45:48Z Optimizing lossless compression by normalized data length in Huffman Algorithm 2022 Tonny, Hidayat Q Science (General) QA Mathematics Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research. 2022 Thesis http://eprints.utem.edu.my/id/eprint/26986/ http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf text en public http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf text en validuser https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177 phd doctoral Universiti Teknikal Malaysia Melaka Faculty of Information and Communication Technology Zakaria, Mohd Hafiz
institution Universiti Teknikal Malaysia Melaka
collection UTeM Repository
language English
English
advisor Zakaria, Mohd Hafiz
topic Q Science (General)
QA Mathematics
spellingShingle Q Science (General)
QA Mathematics
Tonny, Hidayat
Optimizing lossless compression by normalized data length in Huffman Algorithm
description Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Tonny, Hidayat
author_facet Tonny, Hidayat
author_sort Tonny, Hidayat
title Optimizing lossless compression by normalized data length in Huffman Algorithm
title_short Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full Optimizing lossless compression by normalized data length in Huffman Algorithm
title_fullStr Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full_unstemmed Optimizing lossless compression by normalized data length in Huffman Algorithm
title_sort optimizing lossless compression by normalized data length in huffman algorithm
granting_institution Universiti Teknikal Malaysia Melaka
granting_department Faculty of Information and Communication Technology
publishDate 2022
url http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf
_version_ 1794023199796625408