Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology

Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein...

Full description

Saved in:
Bibliographic Details
Main Author: Ismail, Surayati
Format: Thesis
Language:English
Published: 2010
Subjects:
Online Access:http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.16677
record_format uketd_dc
spelling my-utm-ep.166772017-09-17T08:13:19Z Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology 2010 Ismail, Surayati QA75 Electronic computers. Computer science Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein homology, several problems have been identified by researchers which are hard-to-align proteins homology detection and high dimensional feature vectors of proteins caused by redundant and noisy data. To address these problems, a new remote protein homology detection computational framework has been developed. The computational framework begins by extracting structural similarity of protein using highly sensitive structural similarity algorithm which consist of four steps: split protein sequences into substring, calculate similarity using pairwise protein substring alignment, build guide tree, and extract the high structural similarity using multiple protein sequence alignment. Then, Latent Semantic Analysis algorithm (LSA) is used to produce feature vectors. The LSA consist of three steps: generate protein pattern blocks using TEIRESIAS algorithm, remove redundant data using chi-square algorithm, and noisy data using Singular Value Decomposition (SVD) algorithm. Lastly, this computational framework uses SVM to classify all the proteins into homologue or non-homologue members. The proposed computational framework is analyzed using dataset from SCOP database version 1.53 and the performance has been compared with other methods such as PSI-BLAST and SVM-Pairwise sequence comparison models, SAM and HMMER generative models, and SVM-Fisher and SVM-I-Sites discriminative classifier models in terms of Receiver Operating Characteristic (ROC), Median Rate of False Positives (MRFP), and family by family comparison of ROC. The results show that the proposed computational framework successfully outperforms other remote protein homology detection methods. 2010 Thesis http://eprints.utm.my/id/eprint/16677/ http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf application/pdf en public masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information System Faculty of Computer Science and Information System
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Ismail, Surayati
Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
description Remote protein homology detection refers to the detection of structural homology in weak proteins. Remote protein homology is important to identify function for new proteins which could assist in curing genetic diseases, performing drug design, and identifying novel enzymes. To detect remote protein homology, several problems have been identified by researchers which are hard-to-align proteins homology detection and high dimensional feature vectors of proteins caused by redundant and noisy data. To address these problems, a new remote protein homology detection computational framework has been developed. The computational framework begins by extracting structural similarity of protein using highly sensitive structural similarity algorithm which consist of four steps: split protein sequences into substring, calculate similarity using pairwise protein substring alignment, build guide tree, and extract the high structural similarity using multiple protein sequence alignment. Then, Latent Semantic Analysis algorithm (LSA) is used to produce feature vectors. The LSA consist of three steps: generate protein pattern blocks using TEIRESIAS algorithm, remove redundant data using chi-square algorithm, and noisy data using Singular Value Decomposition (SVD) algorithm. Lastly, this computational framework uses SVM to classify all the proteins into homologue or non-homologue members. The proposed computational framework is analyzed using dataset from SCOP database version 1.53 and the performance has been compared with other methods such as PSI-BLAST and SVM-Pairwise sequence comparison models, SAM and HMMER generative models, and SVM-Fisher and SVM-I-Sites discriminative classifier models in terms of Receiver Operating Characteristic (ROC), Median Rate of False Positives (MRFP), and family by family comparison of ROC. The results show that the proposed computational framework successfully outperforms other remote protein homology detection methods.
format Thesis
qualification_level Master's degree
author Ismail, Surayati
author_facet Ismail, Surayati
author_sort Ismail, Surayati
title Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
title_short Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
title_full Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
title_fullStr Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
title_full_unstemmed Sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
title_sort sequence comparison latent semantic analysis and support vector machine to detect remote protein homology
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information System
granting_department Faculty of Computer Science and Information System
publishDate 2010
url http://eprints.utm.my/id/eprint/16677/7/SurayatiIsmailMFSKSM2010.pdf
_version_ 1747815100908044288