Gene selection and classification in autism gene expression data

Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single g...

Full description

Saved in:
Bibliographic Details
Main Author: Al-Jaf, Shilan Sameen Hameed
Format: Thesis
Language:English
Published: 2017
Subjects:
Online Access:http://eprints.utm.my/id/eprint/86999/1/ShilanSameenHameedMFC2017.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.86999
record_format uketd_dc
spelling my-utm-ep.869992020-10-31T12:16:36Z Gene selection and classification in autism gene expression data 2017 Al-Jaf, Shilan Sameen Hameed QA75 Electronic computers. Computer science Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1–2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm. 2017 Thesis http://eprints.utm.my/id/eprint/86999/ http://eprints.utm.my/id/eprint/86999/1/ShilanSameenHameedMFC2017.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:132618 masters Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Al-Jaf, Shilan Sameen Hameed
Gene selection and classification in autism gene expression data
description Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1–2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm.
format Thesis
qualification_level Master's degree
author Al-Jaf, Shilan Sameen Hameed
author_facet Al-Jaf, Shilan Sameen Hameed
author_sort Al-Jaf, Shilan Sameen Hameed
title Gene selection and classification in autism gene expression data
title_short Gene selection and classification in autism gene expression data
title_full Gene selection and classification in autism gene expression data
title_fullStr Gene selection and classification in autism gene expression data
title_full_unstemmed Gene selection and classification in autism gene expression data
title_sort gene selection and classification in autism gene expression data
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2017
url http://eprints.utm.my/id/eprint/86999/1/ShilanSameenHameedMFC2017.pdf
_version_ 1747818519821549568