Neural network for prediction of cysteine disulphide bridge connectivity in proteins

The goal of this thesis is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of Cysteine residues in proteins, which is a sub-problem of the bigger and yet unsolved problem of protein structure prediction. First, we preprocessed the datase...

Full description

Saved in:
Bibliographic Details
Main Author: Bostan, Hamed
Format: Thesis
Language:English
Published: 2010
Subjects:
Online Access:http://eprints.utm.my/id/eprint/18275/1/HamedBostanMFSKSM2010.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The goal of this thesis is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of Cysteine residues in proteins, which is a sub-problem of the bigger and yet unsolved problem of protein structure prediction. First, we preprocessed the datasets from Protein Data Bank (PDB) and filtered mutations and low resolution files out. A number of descriptors in two dimensional (2D) protein sequences are studied. These descriptors are based on local feature values of adjacent amino acid to Cystein residue, namely encoded, propensity value and averaged propensity value. We have used Artificial Neural Network (ANN) as a machine learning technique to develop our prediction method. We use ‘trainlm’, ‘trainrp’ and ‘trainscg’ training functions for training out network and also a 5-fold validation is implemented. Our results show that we can predict the state of Cystein disulphide bond formation. It shows that using propensity valued descriptor and ‘trainscg’ training function is better to be used for Cystein bond state prediction compared to the other training functions and descriptors in this study. The accuracy of prediction in this study is 80.85% on a propensity value descriptor dataset which had been trained by ‘trainscg’ with a dataset of over than 400 thousand protein patterns. Results of this work will have direct implications in site directed mutational studies of protein, protein engineering and the problem of protein folding.