Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction

Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues...

Full description

Saved in:
Bibliographic Details
Main Author: Kasim, Shahreen
Format: Thesis
Language:English
Published: 2011
Subjects:
Online Access:http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.32110
record_format uketd_dc
spelling my-utm-ep.321102018-05-27T07:11:11Z Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction 2011-11 Kasim, Shahreen QA75 Electronic computers. Computer science Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function. 2011-11 Thesis http://eprints.utm.my/id/eprint/32110/ http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf application/pdf en public phd doctoral Universiti Teknologi Malaysia, Faculty of Computer Science and Information System Faculty of Computer Science and Information System
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Kasim, Shahreen
Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
description Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Kasim, Shahreen
author_facet Kasim, Shahreen
author_sort Kasim, Shahreen
title Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_short Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_full Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_fullStr Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_full_unstemmed Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
title_sort fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information System
granting_department Faculty of Computer Science and Information System
publishDate 2011
url http://eprints.utm.my/id/eprint/32110/5/ShahreenKasimPFSKSM2011.pdf
_version_ 1747815923805323264