Feature Engineering for Automated Essay Evaluator of Malaysian University English Test (MUET) based on Linguistic Features

Automated Essay Scoring (AES) refers to the use of specialized computer programs to assess and score essays for overcoming time, cost, and reliability issues in an educational assessment context. It pertains to applications in the field of Natural Language Processing (NLP) and computational linguist...

Full description

Saved in:
Bibliographic Details
Main Author: Wee Sian, Wong
Format: Thesis
Language:English
English
English
Published: 2024
Subjects:
Online Access:http://ir.unimas.my/id/eprint/44898/3/DSVA_Wong%20Wee%20Sian.pdf
http://ir.unimas.my/id/eprint/44898/4/PhD%20Thesis_Wong%20Wee%20Sia.ftext.pdf
http://ir.unimas.my/id/eprint/44898/5/PhD%20Thesis_Wong%20Wee%20Sian%20-%2024%20pages.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automated Essay Scoring (AES) refers to the use of specialized computer programs to assess and score essays for overcoming time, cost, and reliability issues in an educational assessment context. It pertains to applications in the field of Natural Language Processing (NLP) and computational linguistics, which centres on the interactions between computer software and human languages. Several prominent proprietary AES systems are available in the commercial domain, and extensive academic research has been conducted to explore automated essay scoring. One of the issues in AES is its dependence on surface features (e.g., essay length) to score essays. These AES are often criticized because their scoring mechanisms are not associated with the rationale of how human raters typically score essays. Surface-level features from AES do not capture the linguistic aspects of an essay. To address the constraint of this “surface-level” assessment, several recent research have emerged, focusing on leveraging deep linguistic features, such as text cohesion and lexical diversity to assess essays. However, most of this research concentrates on specific linguistic dimensions – none of them provide comprehensive coverage of linguistic dimensions to score essays. Furthermore, AES systems, especially the commercial proprietary and deep neural network AES, exhibit a black-box nature. This non-transparent operation of the AES restraints the clear explanation and interpretation of essay features and scoring mechanisms employed for scoring essays. In response to these AES issues, this research is conducted to develop an AES system, namely the Automated Essay Evaluator (AEE), to score essays based on comprehensive deep linguistic features. It employed the Malaysian University English Test (MUET) essay as the case study for automated essay scoring. The research identified and categorized a total of 1,709 comprehensive linguistic feature indices into a taxonomy comprising eight distinct linguistic feature sets, and 43 linguistic feature categories. These eight linguistic feature sets, namely the surface features, linguistic errors, text cohesion, semantics, lexical diversity, lexical sophistication, syntactic complexity, and readability, should be able to cover most if not all the linguistic features found in essays. A thorough correlation analysis between the linguistic features and the essay grades was conducted. Two feature selection schemes, namely the Correlation Rank and Minimum Redundancy Maximum Relevance (MRMR) Feature Selection have been formulated to select the optimized linguistic features that influence essay scoring. The overall performance of the selected linguistic features was evaluated using six different machine learning classifiers to score MUET essays. Lastly, an interpretation of the proposed linguistic feature set with the MUET essay scoring rubrics has been provided to explain how these linguistic features contribute to the overall essay score. According to the experiment result, this research found that readability, surface features, lexical diversity, and specific lexical sophistication are strong predictors of MUET essay scores. The linguistic features selected by the Correlation Rank and MRMR Feature Selection Scheme outperformed the baseline scheme, which consists of 50 randomly selected features. Furthermore, the linguistic-based automated scoring developed in this research demonstrated superior performance than the LigthSide AES vendor in scoring MUET essays. This linguistic-based essay scoring proposed can be used as the basis for developing a complete full-fledged local Malaysian AES by incorporating essay content features.