A new soft set-based technique for clustering attribute selection in educational data mining

Determining the best clustering attribute is an essential process in data clustering, since this task is a relatively simple and efficient for attributes-based data clustering. Five well-known rough and soft sets-based techniques for selecting a clustering attribute respectively TR, MMR, MDA, NSS, a...

Full description

Saved in:
Bibliographic Details
Main Author: Suhirman, .
Format: Thesis
Language:English
Published: 2016
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/15254/1/A%20new%20soft%20set-based%20technique%20for%20clustering%20attribute%20selection%20in%20educational%20data%20mining.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Determining the best clustering attribute is an essential process in data clustering, since this task is a relatively simple and efficient for attributes-based data clustering. Five well-known rough and soft sets-based techniques for selecting a clustering attribute respectively TR, MMR, MDA, NSS, and MAR have been proposed. MAR technique achieves better computational time than that the four other aforesaid approaches. However, in reviewing MAR, execution time is still an outstanding issue, due to iteration processes in determining the relative attribute. This research proposes an alternative soft set-based technique for selecting a clustering attribute, named Maximum Degree of Domination in Soft set theory (MDDS). In this technique, the notion of multi-soft sets is firstly described. Secondly, the domination of soft sets and its degree are defined. Finally, the maximum degree of domination is used to determine the best clustering attribute. The proposed technique is examined through eighteen UCI benchmark machine learning datasets and compared with the results obtained with that of MAR. The results show that MDDS technique achieves fairly well in reducing computation time and outperforms MAR technique up to 43.99%. Furthermore, MDDS has a good scalability, i.e. the executing time of the technique tends to increase linearly as the data sizes are increased. While the accuracy of eight data sets which have a class attributes has increased 3.23%. Furthermore, the proposed MDDS technique was used to solve real world clustering problem in Educational Data Mining. The data sets were taken from a survey on a few courses at the Information Engineering and the Architecture Departments of the University Technology of Yogyakarta (UTY) Indonesia during the last 4 years. The dominant attribute of dataset assessment were determined using MDDS technique, due to its increased efficiency and accuracy, so decisions can be made faster and accurately.