Big data processing on educational data mining using pyspark with jupyter notebook

The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer...

Full description

Saved in:

Bibliographic Details
Main Author:	Ravichandran, Vinitha
Format:	Thesis
Language:	English
Published:	2018
Online Access:	http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-utm-ep.81375
record_format	uketd_dc
spelling	my-utm-ep.813752019-08-23T04:06:50Z Big data processing on educational data mining using pyspark with jupyter notebook 2018 Ravichandran, Vinitha The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM. 2018 Thesis http://eprints.utm.my/id/eprint/81375/ http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718 masters Universiti Teknologi Malaysia Computer Science
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
description	The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM.
format	Thesis
qualification_level	Master's degree
author	Ravichandran, Vinitha
spellingShingle	Ravichandran, Vinitha Big data processing on educational data mining using pyspark with jupyter notebook
author_facet	Ravichandran, Vinitha
author_sort	Ravichandran, Vinitha
title	Big data processing on educational data mining using pyspark with jupyter notebook
title_short	Big data processing on educational data mining using pyspark with jupyter notebook
title_full	Big data processing on educational data mining using pyspark with jupyter notebook
title_fullStr	Big data processing on educational data mining using pyspark with jupyter notebook
title_full_unstemmed	Big data processing on educational data mining using pyspark with jupyter notebook
title_sort	big data processing on educational data mining using pyspark with jupyter notebook
granting_institution	Universiti Teknologi Malaysia
granting_department	Computer Science
publishDate	2018
url	http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf
_version_	1747818316508954624

Big data processing on educational data mining using pyspark with jupyter notebook

Similar Items