An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence

The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, th...

全面介绍

Saved in:
书目详细资料
主要作者: Ahmad, Nor Azhar
格式: Thesis
出版: 2010
主题:
标签: 添加标签
没有标签, 成为第一个标记此记录!
id my-utm-ep.21289
record_format uketd_dc
spelling my-utm-ep.212892020-03-03T07:29:51Z An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence 2010 Ahmad, Nor Azhar Q Science (General) QA76 Computer software The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data. 2010 Thesis http://eprints.utm.my/id/eprint/21289/ masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems Faculty of Computer Science and Information Systems
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
topic Q Science (General)
QA76 Computer software
spellingShingle Q Science (General)
QA76 Computer software
Ahmad, Nor Azhar
An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
description The use of compression techniques in various fields of data management is very encouraging lately. DNA data size becomes large, and this causes a problem of storage and data transfer. Common approach used is to put this datum into the server which adds to the cost of data management. Furthermore, the transfer of online data is not the best solution anymore. For research center that has a low speed of Internet connection, the transfer is almost impossible to implement. This study proposed an enhancement of LZ77 algorithm, which is the common non-greedy, data dictionary type, using sliding windows concept for alphabethical data compression. By introducing sectioning sliding windows with hash table approach, the proposed compression algorithm can solve the storage problem of large DNA sequences. This implementation can speed up time and improve data compression rates. Two formats of DNA data (binary and FASTA) are tested and analysed. Simulation proved that, data compression rate shows promising results, with the addition of proportional size of the DNA, where it can compress at the rate of 56% per bit. Comparing to the LZ77 based DNA compression algorithm, BioCompress which has 44% of compress rate; the proposed algorithm has outperformed by 12%. Implications from this study will allow cost reduction in handling large scale DNA data.
format Thesis
qualification_level Master's degree
author Ahmad, Nor Azhar
author_facet Ahmad, Nor Azhar
author_sort Ahmad, Nor Azhar
title An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_short An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_full An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_fullStr An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_full_unstemmed An enhanced LZ77 algorithm with hash table to compress large scale DNA sequence
title_sort enhanced lz77 algorithm with hash table to compress large scale dna sequence
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems
granting_department Faculty of Computer Science and Information Systems
publishDate 2010
_version_ 1747815419939389440