Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets

Non-Technical Losses (NTLs) in electrical utilities, primarily related to electrical theft, significantly impact energy supplier companies and the nation’s overall economy. Power distribution companies worldwide rely on time-consuming, laborious, and inefficient random onsite inspections to catch an...

Full description

Saved in:
Bibliographic Details
Main Author: Hussain, Saddam
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/102153/1/SaddamHussainPSKE2022.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.102153
record_format uketd_dc
spelling my-utm-ep.1021532023-08-07T08:15:31Z Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets 2022 Hussain, Saddam TK Electrical engineering. Electronics Nuclear engineering Non-Technical Losses (NTLs) in electrical utilities, primarily related to electrical theft, significantly impact energy supplier companies and the nation’s overall economy. Power distribution companies worldwide rely on time-consuming, laborious, and inefficient random onsite inspections to catch and penalise these fraudster consumers. To address the NTL problem, artificial intelligence-based data mining methods have been extensively researched worldwide. However, most of such theft detection methods explored in the literature have yielded poor accuracy and detection rates. As such, this thesis presents a novel sequentially executed theft detection framework for both labelled and unlabelled dataset scenarios using a realistic approach with comparatively greater accuracy and detection rate. For labelled data class scenarios, a supervised Machine Learning (ML) approach is adopted where the intelligence of the Category and Boosting (CatBoost) algorithm is utilised to categorise the consumers distinctly as “suspicious” and “non-suspicious”. On the other hand, an unsupervised ML method is used to accomplish the same task for the unlabelled dataset employing the Robust Principal Component Analysis (ROBPCA) algorithm in conjunction with the Outlier Removal Clustering (ORC) algorithm. In the case of a labelled dataset scenario, the Synthetic Minority Oversample technique with the Tomek link (SMOTETomek) method is used to balance data class distribution initially. Afterwards, a Feature Extraction based on the Scalable Hypothesis (FRESH) algorithm is implemented to extract and select the most relevant features to facilitate the classifier in comprehending complex and overlapping data patterns. Finally, the intelligence of the CatBoost algorithm is used to build a ML model on the developed feature engineered labelled dataset to distinguish two classes of consumers. In the case of an unlabelled dataset, consumers with the most similar features are grouped into two categories using the ROBPCA algorithm initially. Afterwards, the division boundary between the two newly formed groups is reinforced with the help of the ORC algorithm to achieve a clear distinction between healthy and fraudster consumers. The effectiveness of the proposed theft detection methods is validated by comparing their performance with the few of the most widely used outlier detection methods based on seven of the most prominent performance evaluation metrics. The accuracy of the proposed unsupervised and supervised classifiers is calculated as 91% and 93%, respectively, while their detection rates are estimated as 91% and 92%, respectively. 2022 Thesis http://eprints.utm.my/id/eprint/102153/ http://eprints.utm.my/id/eprint/102153/1/SaddamHussainPSKE2022.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:149112 phd doctoral Universiti Teknologi Malaysia Faculty of Engineering - School of Electrical Engineering
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic TK Electrical engineering
Electronics Nuclear engineering
spellingShingle TK Electrical engineering
Electronics Nuclear engineering
Hussain, Saddam
Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
description Non-Technical Losses (NTLs) in electrical utilities, primarily related to electrical theft, significantly impact energy supplier companies and the nation’s overall economy. Power distribution companies worldwide rely on time-consuming, laborious, and inefficient random onsite inspections to catch and penalise these fraudster consumers. To address the NTL problem, artificial intelligence-based data mining methods have been extensively researched worldwide. However, most of such theft detection methods explored in the literature have yielded poor accuracy and detection rates. As such, this thesis presents a novel sequentially executed theft detection framework for both labelled and unlabelled dataset scenarios using a realistic approach with comparatively greater accuracy and detection rate. For labelled data class scenarios, a supervised Machine Learning (ML) approach is adopted where the intelligence of the Category and Boosting (CatBoost) algorithm is utilised to categorise the consumers distinctly as “suspicious” and “non-suspicious”. On the other hand, an unsupervised ML method is used to accomplish the same task for the unlabelled dataset employing the Robust Principal Component Analysis (ROBPCA) algorithm in conjunction with the Outlier Removal Clustering (ORC) algorithm. In the case of a labelled dataset scenario, the Synthetic Minority Oversample technique with the Tomek link (SMOTETomek) method is used to balance data class distribution initially. Afterwards, a Feature Extraction based on the Scalable Hypothesis (FRESH) algorithm is implemented to extract and select the most relevant features to facilitate the classifier in comprehending complex and overlapping data patterns. Finally, the intelligence of the CatBoost algorithm is used to build a ML model on the developed feature engineered labelled dataset to distinguish two classes of consumers. In the case of an unlabelled dataset, consumers with the most similar features are grouped into two categories using the ROBPCA algorithm initially. Afterwards, the division boundary between the two newly formed groups is reinforced with the help of the ORC algorithm to achieve a clear distinction between healthy and fraudster consumers. The effectiveness of the proposed theft detection methods is validated by comparing their performance with the few of the most widely used outlier detection methods based on seven of the most prominent performance evaluation metrics. The accuracy of the proposed unsupervised and supervised classifiers is calculated as 91% and 93%, respectively, while their detection rates are estimated as 91% and 92%, respectively.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Hussain, Saddam
author_facet Hussain, Saddam
author_sort Hussain, Saddam
title Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
title_short Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
title_full Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
title_fullStr Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
title_full_unstemmed Intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
title_sort intelligent feature engineered-machine learning based electricity theft detection framework for labelled and unlabelled datasets
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Engineering - School of Electrical Engineering
publishDate 2022
url http://eprints.utm.my/id/eprint/102153/1/SaddamHussainPSKE2022.pdf.pdf
_version_ 1776100858474266624