EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis

Convolutional neural networks (CNNs) have achieved significant advancements in biological sequence analysis over recent years. Specifically, it has the edge over the traditional feature-based machine learning approaches in deciphering the regulatory properties of sequences. Nevertheless, one of the...

Full description

Saved in:
Bibliographic Details
Main Author: Shu En, Chia
Format: Thesis
Language:English
English
Published: 2023
Subjects:
Online Access:http://ir.unimas.my/id/eprint/43083/3/Chia%20Shu%20En_dsva.pdf
http://ir.unimas.my/id/eprint/43083/4/Thesis%20Master_Chia%20Shu%20En.ftext.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.43083
record_format uketd_dc
spelling my-unimas-ir.430832023-10-20T07:25:09Z EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis 2023-10-16 Shu En, Chia T Technology (General) Convolutional neural networks (CNNs) have achieved significant advancements in biological sequence analysis over recent years. Specifically, it has the edge over the traditional feature-based machine learning approaches in deciphering the regulatory properties of sequences. Nevertheless, one of the technical challenges remains in representing the biological sequences in a suitable input matrix for effective CNN learning. To address this challenge, this study proposes a novel sequence encoding approach that focuses on modelling enhancer motifs within DNA sequences using a self-organizing map (SOM)-based template feature map. This two-dimensional template map is constructed by clustering known motifs associated with high enhancer activity, enabling it to act as a motif scanner that detects significant enhancer motifs through similarity measures. The motifs within each node of the template map are ranked based on their activation values, allowing for the selection of conserved and significant motifs as the final feature representation. Consequently, the input DNA sequence is transformed into an activation map, where the spatial locations between significant motifs and their activation strengths are utilized to characterize enhancer motifs in a meaningful way. The activation map generated from the input is crucial in developing EnhancerNet, a specialized CNN model designed and trained specifically for enhancer classification. By utilizing the information within the activation map, EnhancerNet effectively learns to recognize and extract discriminative features and patterns. This capability enables EnhancerNet to achieve a high level of accuracy in classifying enhancers. The efficacy of the proposed model is validated by visualizing the enhancer motif activation map and intermediate feature representations within the CNN layers, ensuring the meaningfulness of the learned representations. Furthermore, the proposed method is compared against six state-of-the-art sequence encoding methods (one-hot encoding, k-mer counting, random walk, skip-gram, continuous-bag-of-words, and Global Vectors) using the same benchmark input histone datasets. The evaluation, which encompasses accuracy, precision, recall, F1-score, and AUC score, consistently demonstrates superior performance with improvements ranging from 0.0283 to 0.0573 across these metrics compared to the other methods. Additionally, time accuracy analysis further supports the effectiveness of the proposed model in terms of accuracy and computational efficiency, and a t-test confirms the statistical significance of the performance difference. In conclusion, the comprehensive evaluation results indicate that EnhancerNet is an effective approach for generating meaningful representations, resulting in significant improvements in the performance of CNN classifiers. This thesis work contributes a novel approach for transforming DNA sequences into an enhancer motif activation map, capturing spatial relationships, context dependency, and over-represented motifs. The approach capitalizes on CNN's ability to effectively model higher-level abstraction features, and it is expected to inspire future designs of DNA sequence representation for CNN modelling. UNIMAS Institutional Repository 2023-10 Thesis http://ir.unimas.my/id/eprint/43083/ http://ir.unimas.my/id/eprint/43083/3/Chia%20Shu%20En_dsva.pdf text en staffonly http://ir.unimas.my/id/eprint/43083/4/Thesis%20Master_Chia%20Shu%20En.ftext.pdf text en validuser masters University of Malaysia Sarawak Faculty of Cognitive Sciences and Human Development
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
English
topic T Technology (General)
spellingShingle T Technology (General)
Shu En, Chia
EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
description Convolutional neural networks (CNNs) have achieved significant advancements in biological sequence analysis over recent years. Specifically, it has the edge over the traditional feature-based machine learning approaches in deciphering the regulatory properties of sequences. Nevertheless, one of the technical challenges remains in representing the biological sequences in a suitable input matrix for effective CNN learning. To address this challenge, this study proposes a novel sequence encoding approach that focuses on modelling enhancer motifs within DNA sequences using a self-organizing map (SOM)-based template feature map. This two-dimensional template map is constructed by clustering known motifs associated with high enhancer activity, enabling it to act as a motif scanner that detects significant enhancer motifs through similarity measures. The motifs within each node of the template map are ranked based on their activation values, allowing for the selection of conserved and significant motifs as the final feature representation. Consequently, the input DNA sequence is transformed into an activation map, where the spatial locations between significant motifs and their activation strengths are utilized to characterize enhancer motifs in a meaningful way. The activation map generated from the input is crucial in developing EnhancerNet, a specialized CNN model designed and trained specifically for enhancer classification. By utilizing the information within the activation map, EnhancerNet effectively learns to recognize and extract discriminative features and patterns. This capability enables EnhancerNet to achieve a high level of accuracy in classifying enhancers. The efficacy of the proposed model is validated by visualizing the enhancer motif activation map and intermediate feature representations within the CNN layers, ensuring the meaningfulness of the learned representations. Furthermore, the proposed method is compared against six state-of-the-art sequence encoding methods (one-hot encoding, k-mer counting, random walk, skip-gram, continuous-bag-of-words, and Global Vectors) using the same benchmark input histone datasets. The evaluation, which encompasses accuracy, precision, recall, F1-score, and AUC score, consistently demonstrates superior performance with improvements ranging from 0.0283 to 0.0573 across these metrics compared to the other methods. Additionally, time accuracy analysis further supports the effectiveness of the proposed model in terms of accuracy and computational efficiency, and a t-test confirms the statistical significance of the performance difference. In conclusion, the comprehensive evaluation results indicate that EnhancerNet is an effective approach for generating meaningful representations, resulting in significant improvements in the performance of CNN classifiers. This thesis work contributes a novel approach for transforming DNA sequences into an enhancer motif activation map, capturing spatial relationships, context dependency, and over-represented motifs. The approach capitalizes on CNN's ability to effectively model higher-level abstraction features, and it is expected to inspire future designs of DNA sequence representation for CNN modelling.
format Thesis
qualification_level Master's degree
author Shu En, Chia
author_facet Shu En, Chia
author_sort Shu En, Chia
title EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
title_short EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
title_full EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
title_fullStr EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
title_full_unstemmed EnhancerNet: A Self-Organizing Map-Based DNA Sequence to Enhancer Motif Activation Map Encoding Method for Enhancer Classification with Convolutional Neural Network Analysis
title_sort enhancernet: a self-organizing map-based dna sequence to enhancer motif activation map encoding method for enhancer classification with convolutional neural network analysis
granting_institution University of Malaysia Sarawak
granting_department Faculty of Cognitive Sciences and Human Development
publishDate 2023
url http://ir.unimas.my/id/eprint/43083/3/Chia%20Shu%20En_dsva.pdf
http://ir.unimas.my/id/eprint/43083/4/Thesis%20Master_Chia%20Shu%20En.ftext.pdf
_version_ 1783728548964466688