Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers

Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with s...

Full description

Saved in:
Bibliographic Details
Main Author: Jamaludin, Rosmahaida
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.61066
record_format uketd_dc
spelling my-utm-ep.610662017-10-08T08:57:57Z Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers 2015-09 Jamaludin, Rosmahaida QD Chemistry Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. 2015-09 Thesis http://eprints.utm.my/id/eprint/61066/ http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405 phd doctoral Universiti Teknologi Malaysia, Faculty of Science Faculty of Science
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QD Chemistry
spellingShingle QD Chemistry
Jamaludin, Rosmahaida
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
description Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Jamaludin, Rosmahaida
author_facet Jamaludin, Rosmahaida
author_sort Jamaludin, Rosmahaida
title Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_short Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_full Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_fullStr Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_full_unstemmed Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
title_sort chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
granting_institution Universiti Teknologi Malaysia, Faculty of Science
granting_department Faculty of Science
publishDate 2015
url http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf
_version_ 1747817776864559104