An improved algorithm for iris classification by using support vector machine and binary random machine learning

In machine learning, there are three type of learning branch that can used in classification procedures for data mining. Those branch are consist of supervised learning, unsupervised learning and reinforcement learning. This study focuses on supervised learning that seek to classify all the Iris dat...

Full description

Saved in:
Bibliographic Details
Main Author: Kamarulzalis, Ahmad Haadzal
Format: Thesis
Language:English
English
English
Published: 2018
Subjects:
Online Access:http://eprints.uthm.edu.my/295/1/24p%20ahmad%20haadzal%20kamarulzalis.pdf
http://eprints.uthm.edu.my/295/2/AHMAD%20HAADZAL%20KAMARULZALIS%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/295/3/AHMAD%20HAADZAL%20KAMARULZALIS%20WATERMARK.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In machine learning, there are three type of learning branch that can used in classification procedures for data mining. Those branch are consist of supervised learning, unsupervised learning and reinforcement learning. This study focuses on supervised learning that seek to classify all the Iris dataset respect to three species (setosa, versicolor and virginica) in order them to mimic the actual dataset by using Support Vector Machine with four different kernel function (Linear, Radial Basis, Sigmoid and Polynomial), Random Forest (RF), k-Nearest Neighbors(k-NN) and Random Nearest Neighbors (RNN) as a method. The first objective of this study is to improve a new algorithm technique for classification. The new algorithm come from a combination of an ideas of k-NN algorithm and ensemble concept. The second objective is to conduct a supervised and binary ensemble machine learning technique for classification. This is done by using method of RF and RNN that share the same ensemble concept. The last objective is to identify the best model for classification procedures. Performance Measurement Tools such as overall accuracy, kappa, average sensitivity, average specificity, average precious, average detection rate, average prevalence and misclassification error rate (MER) were used by refers confusion matrix values output during data analysis for average and individual performance of each classifier. Besides that, Performance Visualization such as Stacked Bar Plot, Fourfold Plot, Receiver Operating Characteristic (ROC) Curve and Lollipop Chart are used to simplify each output for more clear understanding. Random Nearest Neighbors (RNN) has highest accuracy value that is 98.67% and just 1.33% misclassification error rate (MER) compare to other classifier. Therefore, Random Nearest Neighbors (RNN) is preferable for supervised learning classification procedures.