An improved biclustering algorithm with overlapping control for identification of informative genes and pathways

Due to the rise of microarray technology, many tools and methods have been developed to analyse the huge number of gene expression data such as clustering analysis. This clustering analysis is being used for different purposes such as functional annotation, tissue classification and motif identifica...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammad Kusairi, Rohani
Format: Thesis
Language:English
Published: 2021
Subjects:
Online Access:http://eprints.utm.my/102857/1/RohaniMohammadMSC2021.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.102857
record_format uketd_dc
spelling my-utm-ep.1028572023-09-26T05:58:30Z An improved biclustering algorithm with overlapping control for identification of informative genes and pathways 2021 Mohammad Kusairi, Rohani QA75 Electronic computers. Computer science Due to the rise of microarray technology, many tools and methods have been developed to analyse the huge number of gene expression data such as clustering analysis. This clustering analysis is being used for different purposes such as functional annotation, tissue classification and motif identification. Moreover, the clustering methods have made an achievement in the analysis of genetic data by clustering those genes with similar expression patterns into one cluster. Therefore, the genes with similar patterns are obtained and those genes are further analysed to extract the potential biological information. Traditional clustering methods are used to group genes that behave similarly under all conditions but are unable to perform twodimensional grouping simultaneously. As a result, clusters obtained either contain all rows of data matrix or all columns of data matrix and thus ignoring the local coexpression effects which are present in only a subset of all biological samples. Other than that, clustering methods are unable to assign genes to multiple clusters as they do not correspond to the gene natural behaviour which has more than one function and can participate in multiple pathways. Due to limitations of traditional clustering analysis, a biclustering algorithm as a new method was introduced to identify local patterns in the data by clustering the gene dimension and condition dimension simultaneously. This local correlation information between the subset of genes and conditions is then used to improve the accuracy of clustering results. However, overlapping is another issue in biclustering. As some of the genes may belong to multiple functional categories, overlapping may be considered as one of the bicluster’s behaviours but the overlapping among the bicluster need to be controlled to prevent the redundancy of the biclusters formed. This research proposed an improved overlapping control in biclustering algorithms for identification of informative genes from the gene expression data. The overlapping control is crucial in biclusters to hinder the redundancy of the biclusters produced and indirectly the number of the biclusters obtained can be reduced. Experiments were conducted on two microarray data sets (ovarian cancer dataset and glioblastoma cancer dataset). The results obtained were evaluated using 10-fold cross validation and compared with the Qualitative Biclustering Algorithm (Qubic). In addition, the results were further analysed in terms of accuracy, standard deviation, variance and t-test and the proposed method indicated a higher accuracy for Ovarian dataset (96.54%) and glioblastoma dataset (75.68%). This method showed consistent improvement in terms of accuracy of the biclusters when tested using SVM classifier over the Qualitative Biclustering Algorithm (Qubic) method. Biological context verification was then conducted to elucidate the relation of the selected genes such as ERBB2, VCAM1, CD3D and pathways (Endocytosis pathway, Bladder Cancer pathway and Pancreatic Cancer pathway) with the phenotype under study. 2021 Thesis http://eprints.utm.my/102857/ http://eprints.utm.my/102857/1/RohaniMohammadMSC2021.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150622 masters Universiti Teknologi Malaysia Faculty of Engineering - School of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Mohammad Kusairi, Rohani
An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
description Due to the rise of microarray technology, many tools and methods have been developed to analyse the huge number of gene expression data such as clustering analysis. This clustering analysis is being used for different purposes such as functional annotation, tissue classification and motif identification. Moreover, the clustering methods have made an achievement in the analysis of genetic data by clustering those genes with similar expression patterns into one cluster. Therefore, the genes with similar patterns are obtained and those genes are further analysed to extract the potential biological information. Traditional clustering methods are used to group genes that behave similarly under all conditions but are unable to perform twodimensional grouping simultaneously. As a result, clusters obtained either contain all rows of data matrix or all columns of data matrix and thus ignoring the local coexpression effects which are present in only a subset of all biological samples. Other than that, clustering methods are unable to assign genes to multiple clusters as they do not correspond to the gene natural behaviour which has more than one function and can participate in multiple pathways. Due to limitations of traditional clustering analysis, a biclustering algorithm as a new method was introduced to identify local patterns in the data by clustering the gene dimension and condition dimension simultaneously. This local correlation information between the subset of genes and conditions is then used to improve the accuracy of clustering results. However, overlapping is another issue in biclustering. As some of the genes may belong to multiple functional categories, overlapping may be considered as one of the bicluster’s behaviours but the overlapping among the bicluster need to be controlled to prevent the redundancy of the biclusters formed. This research proposed an improved overlapping control in biclustering algorithms for identification of informative genes from the gene expression data. The overlapping control is crucial in biclusters to hinder the redundancy of the biclusters produced and indirectly the number of the biclusters obtained can be reduced. Experiments were conducted on two microarray data sets (ovarian cancer dataset and glioblastoma cancer dataset). The results obtained were evaluated using 10-fold cross validation and compared with the Qualitative Biclustering Algorithm (Qubic). In addition, the results were further analysed in terms of accuracy, standard deviation, variance and t-test and the proposed method indicated a higher accuracy for Ovarian dataset (96.54%) and glioblastoma dataset (75.68%). This method showed consistent improvement in terms of accuracy of the biclusters when tested using SVM classifier over the Qualitative Biclustering Algorithm (Qubic) method. Biological context verification was then conducted to elucidate the relation of the selected genes such as ERBB2, VCAM1, CD3D and pathways (Endocytosis pathway, Bladder Cancer pathway and Pancreatic Cancer pathway) with the phenotype under study.
format Thesis
qualification_level Master's degree
author Mohammad Kusairi, Rohani
author_facet Mohammad Kusairi, Rohani
author_sort Mohammad Kusairi, Rohani
title An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
title_short An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
title_full An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
title_fullStr An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
title_full_unstemmed An improved biclustering algorithm with overlapping control for identification of informative genes and pathways
title_sort improved biclustering algorithm with overlapping control for identification of informative genes and pathways
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Engineering - School of Computing
publishDate 2021
url http://eprints.utm.my/102857/1/RohaniMohammadMSC2021.pdf.pdf
_version_ 1783729225439641600