Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling

Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have pri...

Full description

Saved in:
Bibliographic Details
Main Author: Neha, Gautam
Format: Thesis
Language:English
English
English
Published: 2024
Subjects:
Online Access:http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf
http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf
http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-unimas-ir.44482
record_format uketd_dc
spelling my-unimas-ir.444822024-03-22T00:57:26Z Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling 2024 Neha, Gautam QA75 Electronic computers. Computer science Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have primarily relied on local datasets, hampering the standardization of datasets. To address this, the study proposes a systematic dataset creation process that encompasses data collection, pre-processing, segmentation, data augmentation, recognition, storage, and labelling. The process is initiated with data collection from diverse online sources, yielding 217 text and word samples and 801 isolated characters and compound characters. However, these samples lack uniformity in text and word sizes. The subsequent phase focuses on character isolation from words and text, utilizing a novel segmentation approach as a crucial precursor to system training. A ContourKNN Brahmi Segmentation (CKBS) for character and compound character segmentation is introduced. Object detection identifies characters, including dots (.), and links them to their nearest left character using KNN. This approach greatly enhances segmentation, achieving an impressive 98.19% average accuracy. The segmentation approach generates 40 samples per class across 170 classes, with a 75:25 training-testing split (30 and 10 samples for training and testing, respectively). Furthermore, data augmentation techniques, including adjustments, deformations, blurring, translations, and noise introduction, are applied to enhance dataset quality and quantity. Data augmentation results in 180 training and 60 testing samples per class, improving both size and quality. Subsequently, a Two-Phase Enhanced Brahmi Recognition (TPEBR) is employed, distinguishing between global and local feature recognition. Various deep learning architectures are evaluated for classification, with resizing to meet specific input size requirements. SqueezeNet emerges as the most effective, achieving a minimal 0.237 loss and an exceptional 97.58% accuracy. It excels in precision, recall, and F1-score. In contrast, ResNeXt Small underperforms with higher loss and lower accuracy. Comparing the Two-Phase Enhanced Brahmi Recognition (TPEBR) to the existing approaches, the Two-Phase Enhanced Brahmi Recognition (TPEBR) achieves 97.58% accuracy, while the existing approaches records 80.20% and 90.24%. Recognized characters are then organized into folders according to their recognized class, and done labelling by using Brahmi Unicode, although this step does not impact performance of the system. Universiti Malaysia Sarawak 2024 Thesis http://ir.unimas.my/id/eprint/44482/ http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf text en public http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf text en validuser http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf text en staffonly phd doctoral Universiti Malaysia Sarawak FCSIT
institution Universiti Malaysia Sarawak
collection UNIMAS Institutional Repository
language English
English
English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Neha, Gautam
Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
description Automatic word recognition problem can be solved using an optical character recognition (OCR) system. Few studies have been seen in the field of Brahmi word recognition especially identifying compound characters and words with good accuracy. However, existing Brahmi text recognition studies have primarily relied on local datasets, hampering the standardization of datasets. To address this, the study proposes a systematic dataset creation process that encompasses data collection, pre-processing, segmentation, data augmentation, recognition, storage, and labelling. The process is initiated with data collection from diverse online sources, yielding 217 text and word samples and 801 isolated characters and compound characters. However, these samples lack uniformity in text and word sizes. The subsequent phase focuses on character isolation from words and text, utilizing a novel segmentation approach as a crucial precursor to system training. A ContourKNN Brahmi Segmentation (CKBS) for character and compound character segmentation is introduced. Object detection identifies characters, including dots (.), and links them to their nearest left character using KNN. This approach greatly enhances segmentation, achieving an impressive 98.19% average accuracy. The segmentation approach generates 40 samples per class across 170 classes, with a 75:25 training-testing split (30 and 10 samples for training and testing, respectively). Furthermore, data augmentation techniques, including adjustments, deformations, blurring, translations, and noise introduction, are applied to enhance dataset quality and quantity. Data augmentation results in 180 training and 60 testing samples per class, improving both size and quality. Subsequently, a Two-Phase Enhanced Brahmi Recognition (TPEBR) is employed, distinguishing between global and local feature recognition. Various deep learning architectures are evaluated for classification, with resizing to meet specific input size requirements. SqueezeNet emerges as the most effective, achieving a minimal 0.237 loss and an exceptional 97.58% accuracy. It excels in precision, recall, and F1-score. In contrast, ResNeXt Small underperforms with higher loss and lower accuracy. Comparing the Two-Phase Enhanced Brahmi Recognition (TPEBR) to the existing approaches, the Two-Phase Enhanced Brahmi Recognition (TPEBR) achieves 97.58% accuracy, while the existing approaches records 80.20% and 90.24%. Recognized characters are then organized into folders according to their recognized class, and done labelling by using Brahmi Unicode, although this step does not impact performance of the system.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Neha, Gautam
author_facet Neha, Gautam
author_sort Neha, Gautam
title Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_short Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_full Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_fullStr Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_full_unstemmed Contour-KNN Brahmi Segmentation (CKBS) and Two-Phase Enhanced Brahmi Recognition (TREBR) Methods for Automatic Brahmi Texts Labelling
title_sort contour-knn brahmi segmentation (ckbs) and two-phase enhanced brahmi recognition (trebr) methods for automatic brahmi texts labelling
granting_institution Universiti Malaysia Sarawak
granting_department FCSIT
publishDate 2024
url http://ir.unimas.my/id/eprint/44482/2/Thesis%20PhD_NehaGautam.open%20-24%20pages.pdf
http://ir.unimas.my/id/eprint/44482/3/Thesis%20PhD_NehaGautam.ftext.pdf
http://ir.unimas.my/id/eprint/44482/4/Thesis%20PhD_NehaGautam.dsva.pdf
_version_ 1804888429026082816