Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun

Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in...

Full description

Saved in:
Bibliographic Details
Main Author: Harun, Hazaruddin
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-uitm-ir.16103
record_format uketd_dc
spelling my-uitm-ir.161032022-03-10T03:03:12Z Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun 2015 Harun, Hazaruddin Programming. Rule-based programming. Backtrack programming Algorithms Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in identifying a specifically selected species where the exact location of these motifs also needs to be identified. Evaluation of these algorithms showed that the results are unsatisfactory due to the lower validity and accuracy of these algorithms. At present, DNA sequencing analysis is the most utilised technique for species identification where patterns of DNA sequences are determined by comparing the sequence to comprehensive databases. However, several false and gap sequences had been identified to be present in these databases which lead to false identification. Therefore, this study addresses these problems by introducing a hybrid algorithm for MD. In this study, the MD is a process to discover all possible motifs that existed in DNA sequences whereas Motif Identification (MI) is a process to identify the correct motif that can represent a selected species. Particle Swarm Optimisation (PSG) was selected as the base algorithm that needs improvement and integration with other techniques. The Linear-PSO algorithm was the first version of improvement. However due to the longer time required for complete execution of this algorithm, the Binary Search technique was integrated and a new version of the algorithm was developed, namely the Linear-PSO with Binary Search (LPBS) algorithm. A total of 11 experiments were conducted in this research, where the aim of the first four experiments was algorithm improvement; the next four experiments were for identifying suitable input data, while the final three experiments were for algorithm validation. Several DNA sequences from different species were collected from the GenBank and TRansCompel databases and used as input for the algorithm. The collected DNA sequences were from the Mitochondrial Cytochrome C Oxidase Subunit I (COXl) gene. Due to the limitation of available data, only four species were collected for Motif Discovery, namely pig, cow, yak, and chicken. Another five species were used for Motif Identification, which were human, sheep, dog, frog, and rat. The algorithm was run on an Intel(R) Core(TM) Duo CPU 1.73 GHz notebook with 3 GB RAM. The results showed that the LPBS algorithm was able to discover possible correct motifs that can represent a species with higher validity and accuracy as compared to previous algorithms. The motifs discovered were consistent for each execution with higher calculated fitness values. 2015 Thesis https://ir.uitm.edu.my/id/eprint/16103/ https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf text en public phd doctoral Universiti Teknologi MARA Faculty of Computer and Mathematical Sciences
institution Universiti Teknologi MARA
collection UiTM Institutional Repository
language English
topic Programming
Rule-based programming
Backtrack programming
Algorithms
spellingShingle Programming
Rule-based programming
Backtrack programming
Algorithms
Harun, Hazaruddin
Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
description Motif Discovery (MD) is the process of identifying meaningful patterns in DNA, RNA, or protein sequences. In the field of bioinformatics, a pattern is also known as a motif. Numerous algorithms had been developed for MD, but most of these were not designed to discover species specific motifs used in identifying a specifically selected species where the exact location of these motifs also needs to be identified. Evaluation of these algorithms showed that the results are unsatisfactory due to the lower validity and accuracy of these algorithms. At present, DNA sequencing analysis is the most utilised technique for species identification where patterns of DNA sequences are determined by comparing the sequence to comprehensive databases. However, several false and gap sequences had been identified to be present in these databases which lead to false identification. Therefore, this study addresses these problems by introducing a hybrid algorithm for MD. In this study, the MD is a process to discover all possible motifs that existed in DNA sequences whereas Motif Identification (MI) is a process to identify the correct motif that can represent a selected species. Particle Swarm Optimisation (PSG) was selected as the base algorithm that needs improvement and integration with other techniques. The Linear-PSO algorithm was the first version of improvement. However due to the longer time required for complete execution of this algorithm, the Binary Search technique was integrated and a new version of the algorithm was developed, namely the Linear-PSO with Binary Search (LPBS) algorithm. A total of 11 experiments were conducted in this research, where the aim of the first four experiments was algorithm improvement; the next four experiments were for identifying suitable input data, while the final three experiments were for algorithm validation. Several DNA sequences from different species were collected from the GenBank and TRansCompel databases and used as input for the algorithm. The collected DNA sequences were from the Mitochondrial Cytochrome C Oxidase Subunit I (COXl) gene. Due to the limitation of available data, only four species were collected for Motif Discovery, namely pig, cow, yak, and chicken. Another five species were used for Motif Identification, which were human, sheep, dog, frog, and rat. The algorithm was run on an Intel(R) Core(TM) Duo CPU 1.73 GHz notebook with 3 GB RAM. The results showed that the LPBS algorithm was able to discover possible correct motifs that can represent a species with higher validity and accuracy as compared to previous algorithms. The motifs discovered were consistent for each execution with higher calculated fitness values.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Harun, Hazaruddin
author_facet Harun, Hazaruddin
author_sort Harun, Hazaruddin
title Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
title_short Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
title_full Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
title_fullStr Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
title_full_unstemmed Linear-pso with binary search algorithm for DNA motif discovery / Hazaruddin Harun
title_sort linear-pso with binary search algorithm for dna motif discovery / hazaruddin harun
granting_institution Universiti Teknologi MARA
granting_department Faculty of Computer and Mathematical Sciences
publishDate 2015
url https://ir.uitm.edu.my/id/eprint/16103/1/TP_HAZARUDDIN%20HARUN%20CS%2015_5.pdf
_version_ 1783733482883645440