Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers
Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with s...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-utm-ep.61066 |
---|---|
record_format |
uketd_dc |
spelling |
my-utm-ep.610662017-10-08T08:57:57Z Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers 2015-09 Jamaludin, Rosmahaida QD Chemistry Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. 2015-09 Thesis http://eprints.utm.my/id/eprint/61066/ http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:96405 phd doctoral Universiti Teknologi Malaysia, Faculty of Science Faculty of Science |
institution |
Universiti Teknologi Malaysia |
collection |
UTM Institutional Repository |
language |
English |
topic |
QD Chemistry |
spellingShingle |
QD Chemistry Jamaludin, Rosmahaida Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
description |
Three major aspects of chemometrics have been investigated in this study namely Quantitative Structure-Activity Relationship (QSAR) and database mining, classification and multiblock methods. In the first analysis, 197 artemisinin compounds were divided into training set and test set together with structural descriptors generated by DRAGON 6.0 software had been used to develop three QSAR models. Statistics of the models were (r2/ rtest2) 0.790/0.853 for Forward Stepwise-Multiple Linear Regression (MLR), 0.807/0.789 for Genetic Algorithm (GA)-MLR and 0.795/0.811 for GA-Partial Least Square (PLS). The rigorously validated QSAR models were then applied to mine a chemical database which resulted in four potential new anti-malarial agents. The same artemisinin data set was then classified into active and less active compounds to develop reliable predictive classification models and to investigate the consequences of using various data splitting and data pre-processing methods on classification. Principal Component Analysis (PCA) and boundary plot had been utilized to visualize the four classifiers namely Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Linear Vector Quantization (LVQ) and Quadratic Discriminant Analysis (QDA). Kennard-Stone data splitting and standardization had produced better results in terms of percent correctly classified (% CC) compared to Duplex data-splitting and mean-centering. Moreover, LDA was found to be superior as compared to the other three classifiers with lower risk of over-fitting. Lastly, multiblock analysis methods such as Multiblock PLS and Consensus PCA have been implemented on polychlorinated diphenyl ethers (PCDEs) data set together with their respective descriptors blocked into three groups labelled as X 1D, X 2D, X 3D and a property block, Y which consists of log PL (Pa, 25°C), log K OW (25°C) and log SWL (mol/L, 25°C). Their performance were then compared to single block methods that is PLS and PCA. The PLS models of each descriptor block with respect to each property were statistically best-fitted and well predicted with rtrain2 values greater than 0.96 while the rtest values range from 0.86 to 0.98. It is interesting to note that the combination of the three descriptor blocks into a single block to produce Multiblock PLS superscores (MBSS) model which was superior than Multiblock PLS block-scores (MBBS) yielded slightly better rtrain2 value and significantly better prediction with higher rtest as compared to PLS model of individual descriptor block. In addition, three measures of block similarity such as Mantel Test, Rv coefficient and Procrustes analysis were used to investigate similarity and correlation between the blocks along with Monte Carlo simulations to determine their significance. Based on the similarity index between two blocks, X jD descriptors resembled Y block better while X 2D was more correlated to X 1D block. In short, the chemometric methods had been applied successfully on both data sets using various descriptors generated by DRAGON software and yielded promising results beneficial not only in chemometrics area but also in drug design. |
format |
Thesis |
qualification_name |
Doctor of Philosophy (PhD.) |
qualification_level |
Doctorate |
author |
Jamaludin, Rosmahaida |
author_facet |
Jamaludin, Rosmahaida |
author_sort |
Jamaludin, Rosmahaida |
title |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_short |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_full |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_fullStr |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_full_unstemmed |
Chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
title_sort |
chemometrics and multiblock methods for quantitative structure-activity studies of artemisinin analogues and polychlorinated diphenylethers |
granting_institution |
Universiti Teknologi Malaysia, Faculty of Science |
granting_department |
Faculty of Science |
publishDate |
2015 |
url |
http://eprints.utm.my/id/eprint/61066/1/RosmahaidaJamaludinPFS2015.pdf |
_version_ |
1747817776864559104 |