Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-uitm-ir.16103 |
---|---|
record_format |
uketd_dc |
spelling |
my-uitm-ir.161032022-03-10T03:03:12Z Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun 2015 Harun, Hazaruddin Programming. Rule-based programming. Backtrack programming Algorithms Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in identifying a specifically selected species where the exact location of these motifs also needs to be identified. Evaluation of these algorithms showed that the results are unsatisfactory due to the lower validity and accuracy of these algorithms. At present, DNA sequencing analysis is the most utilised technique for species identification where patterns of DNA sequences are determined by comparing the sequence to comprehensive databases. However, several false and gap sequences had been identified to be present in these databases which lead to false identification. Therefore, this study addresses these problems by introducing a hybrid algorithm for MD. In this study, the MD is a process to discover all possible motifs that existed in DNA sequences whereas Motif Identification (MI) is a process to identify the correct motif that can represent a selected species. Particle Swarm Optimisation (PSG) was selected as the base algorithm that needs improvement and integration with other techniques. The Linear-PSO algorithm was the first version of improvement. However due to the longer time required for complete execution of this algorithm, the Binary Search technique was integrated and a new version of the algorithm was developed, namely the Linear-PSO with Binary Search (LPBS) algorithm. A total of 11 experiments were conducted in this research, where the aim of the first four experiments was algorithm improvement; the next four experiments were for identifying suitable input data, while the final three experiments were for algorithm validation. Several DNA sequences from different species were collected from the GenBank and TRansCompel databases and used as input for the algorithm. The collected DNA sequences were from the Mitochondrial Cytochrome C Oxidase Subunit I (COXl) gene. Due to the limitation of available data, only four species were collected for Motif Discovery, namely pig, cow, yak, and chicken. Another five species were used for Motif Identification, which were human, sheep, dog, frog, and rat. The algorithm was run on an Intel(R) Core(TM) Duo CPU 1.73 GHz notebook with 3 GB RAM. The results showed that the LPBS algorithm was able to discover possible correct motifs that can represent a species with higher validity and accuracy as compared to previous algorithms. The motifs discovered were consistent for each execution with higher calculated fitness values. 2015 Thesis https://ir.uitm.edu.my/id/eprint/16103/ https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf text en public phd doctoral Universiti Teknologi MARA Faculty of Computer and Mathematical Sciences |
institution |
Universiti Teknologi MARA |
collection |
UiTM Institutional Repository |
language |
English |
topic |
Programming Rule-based programming Backtrack programming Algorithms |
spellingShingle |
Programming Rule-based programming Backtrack programming Algorithms Harun, Hazaruddin Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
description |
Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in identifying a specifically selected species where the exact location of these motifs also needs to be identified. Evaluation of these algorithms showed that the results are unsatisfactory due to the lower validity and accuracy of these algorithms. At present, DNA sequencing analysis is the most utilised technique for species identification where patterns of DNA sequences are determined by comparing the sequence to comprehensive databases. However, several false and gap sequences had been identified to be present in these databases which lead to false identification. Therefore, this study addresses these problems by introducing a hybrid algorithm for MD. In this study, the MD is a process to discover all possible motifs that existed in DNA sequences whereas Motif Identification (MI) is a process to identify the correct motif that can represent a selected species. Particle Swarm Optimisation (PSG) was selected as the base algorithm that needs improvement and integration with other techniques. The Linear-PSO algorithm was the first version of improvement. However due to the longer time required for complete execution of this algorithm, the Binary Search technique was integrated and a new version of the algorithm was developed, namely the Linear-PSO with Binary Search (LPBS) algorithm. A total of 11 experiments were conducted in this research, where the aim of the first four experiments was algorithm improvement; the next four experiments were for identifying suitable input data, while the final three experiments were for algorithm validation. Several DNA sequences from different species were collected from the GenBank and TRansCompel databases and used as input for the algorithm. The collected DNA sequences were from the Mitochondrial Cytochrome C Oxidase Subunit I (COXl) gene. Due to the limitation of available data, only four species were collected for Motif Discovery, namely pig, cow, yak, and chicken. Another five species were used for Motif Identification, which were human, sheep, dog, frog, and rat. The algorithm was run on an Intel(R) Core(TM) Duo CPU 1.73 GHz notebook with 3 GB RAM. The results showed that the LPBS algorithm was able to discover possible correct motifs that can represent a species with higher validity and accuracy as compared to previous algorithms. The motifs discovered were consistent for each execution with higher calculated fitness values. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Doctorate |
author |
Harun, Hazaruddin |
author_facet |
Harun, Hazaruddin |
author_sort |
Harun, Hazaruddin |
title |
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
title_short |
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
title_full |
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
title_fullStr |
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
title_full_unstemmed |
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun |
title_sort |
linear-pso with binary search algorithm for dna motif discovery / hazaruddin harun |
granting_institution |
Universiti Teknologi MARA |
granting_department |
Faculty of Computer and Mathematical Sciences |
publishDate |
2015 |
url |
https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf |
_version_ |
1783733482883645440 |