Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling

Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have pri...

Full description

Saved in:

Bibliographic Details
Main Author:	Neha, Gautam
Format:	Thesis
Language:	English English English
Published:	2024
Subjects:	QA75 Electronic computers Computer science
Online Access:	http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-unimas-ir.44482
record_format	uketd_dc
spelling	my-unimas-ir.444822024-03-22T00:57:26Z Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling 2024 Neha, Gautam QA75 Electronic computers. Computer science Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have primarily relied on local datasets, hampering the standardization of datasets. To address this, the study proposes a systematic dataset creation process that encompasses data collection, pre-processing, segmentation, data augmentation, recognition, storage, and labelling. The process is initiated with data collection from diverse online sources, yielding 217 text and word samples and 801 isolated characters and compound characters. However, these samples lack uniformity in text and word sizes. The subsequent phase focuses on character isolation from words and text, utilizing a novel segmentation approach as a crucial precursor to system training. A ContourKNN Brahmi Segmentation (CKBS) for character and compound character segmentation is introduced. Object detection identifies characters, including dots (.), and links them to their nearest left character using KNN. This approach greatly enhances segmentation, achieving an impressive 98.19% average accuracy. The segmentation approach generates 40 samples per class across 170 classes, with a 75:25 training-testing split (30 and 10 samples for training and testing, respectively). Furthermore, data augmentation techniques, including adjustments, deformations, blurring, translations, and noise introduction, are applied to enhance dataset quality and quantity. Data augmentation results in 180 training and 60 testing samples per class, improving both size and quality. Subsequently, a Two-Phase Enhanced Brahmi Recognition (TPEBR) is employed, distinguishing between global and local feature recognition. Various deep learning architectures are evaluated for classification, with resizing to meet specific input size requirements. SqueezeNet emerges as the most effective, achieving a minimal 0.237 loss and an exceptional 97.58% accuracy. It excels in precision, recall, and F1-score. In contrast, ResNeXt Small underperforms with higher loss and lower accuracy. Comparing the Two-Phase Enhanced Brahmi Recognition (TPEBR) to the existing approaches, the Two-Phase Enhanced Brahmi Recognition (TPEBR) achieves 97.58% accuracy, while the existing approaches records 80.20% and 90.24%. Recognized characters are then organized into folders according to their recognized class, and done labelling by using Brahmi Unicode, although this step does not impact performance of the system. Universiti Malaysia Sarawak 2024 Thesis http://ir.unimas.my/id/eprint/44482/ http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf text en public http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf text en validuser http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf text en staffonly phd doctoral Universiti Malaysia Sarawak FCSIT
institution	Universiti Malaysia Sarawak
collection	UNIMAS Institutional Repository
language	English English English
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Neha, Gautam Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
description	Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have primarily relied on local datasets, hampering the standardization of datasets. To address this, the study proposes a systematic dataset creation process that encompasses data collection, pre-processing, segmentation, data augmentation, recognition, storage, and labelling. The process is initiated with data collection from diverse online sources, yielding 217 text and word samples and 801 isolated characters and compound characters. However, these samples lack uniformity in text and word sizes. The subsequent phase focuses on character isolation from words and text, utilizing a novel segmentation approach as a crucial precursor to system training. A ContourKNN Brahmi Segmentation (CKBS) for character and compound character segmentation is introduced. Object detection identifies characters, including dots (.), and links them to their nearest left character using KNN. This approach greatly enhances segmentation, achieving an impressive 98.19% average accuracy. The segmentation approach generates 40 samples per class across 170 classes, with a 75:25 training-testing split (30 and 10 samples for training and testing, respectively). Furthermore, data augmentation techniques, including adjustments, deformations, blurring, translations, and noise introduction, are applied to enhance dataset quality and quantity. Data augmentation results in 180 training and 60 testing samples per class, improving both size and quality. Subsequently, a Two-Phase Enhanced Brahmi Recognition (TPEBR) is employed, distinguishing between global and local feature recognition. Various deep learning architectures are evaluated for classification, with resizing to meet specific input size requirements. SqueezeNet emerges as the most effective, achieving a minimal 0.237 loss and an exceptional 97.58% accuracy. It excels in precision, recall, and F1-score. In contrast, ResNeXt Small underperforms with higher loss and lower accuracy. Comparing the Two-Phase Enhanced Brahmi Recognition (TPEBR) to the existing approaches, the Two-Phase Enhanced Brahmi Recognition (TPEBR) achieves 97.58% accuracy, while the existing approaches records 80.20% and 90.24%. Recognized characters are then organized into folders according to their recognized class, and done labelling by using Brahmi Unicode, although this step does not impact performance of the system.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Neha, Gautam
author_facet	Neha, Gautam
author_sort	Neha, Gautam
title	Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_short	Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_full	Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_fullStr	Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_full_unstemmed	Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_sort	contour-knn brahmi segmentation (ckbs) and two-phase enhanced brahmi recognition (trebr) methods for automatic brahmi texts labelling
granting_institution	Universiti Malaysia Sarawak
granting_department	FCSIT
publishDate	2024
url	http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf
_version_	1804888429026082816

Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling

Similar Items