Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system

Automatic word recognition has proved an intensive research subject for many languages in the last decades, but it is still far from the final frontier for some languages. The word recognition is divided into two types: online and offline. The current research is focused on the offline handwritten w...

Full description

Saved in:

Bibliographic Details
Main Author:	Akbarpour, Shahin
Format:	Thesis
Language:	English
Published:	2011
Subjects:	Support vector machines Persian language - Written Persian APT (Computer program language)
Online Access:	http://psasir.upm.edu.my/id/eprint/26987/1/FSKTM%202011%2021R.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-upm-ir.26987
record_format	uketd_dc
spelling	my-upm-ir.269872015-05-14T07:24:38Z Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system 2011-08 Akbarpour, Shahin Automatic word recognition has proved an intensive research subject for many languages in the last decades, but it is still far from the final frontier for some languages. The word recognition is divided into two types: online and offline. The current research is focused on the offline handwritten word recognition (FHWR). An offline handwritten word recognition system includes many stages. All stages should be improved in order to enhance accuracy of the system. In addition, one of the most significant current discussions in enhancement of the accuracy of handwritten word recognition is reducing the lexicon size. Many studies have been carried out so far, but FHWR has not been researched as thoroughly as Latin or Chinese handwritten systems. Several attempts have been made to address FHWR, most of which focusing on the image preprocessing and segmentation. It is also worth mentioning that some studies have already been done on the feature extraction, classification and lexicon reduction methods. In the latest and the most successful prior studies, a feature extraction method, a lexicon reduction, and hidden Markov model (HMM) have been used. However, the recognition rate is not superior owing to the fact that the feature extraction method could not truly describe the Farsi word. Moreover, there exist some limitations in HMM, and several segmentation errors occurred in their lexicon reduction. The current research is focused on solving the mentioned problems through improving the accuracy of recognition rate of FHWR by proposing a new feature extraction and lexicon reduction methods, and finding a suitable classification. In this regard, some special attributes of Farsi manuscripts such as the stroke directions, non-unique black pixels distribution on binary image of the word, the number of the sub-word(s) and dot(s) of the word will be considered. In addition, several classification methods will be tested in order to determine which one is the best for better accuracy of recognition rate other than HMM. We developed two word recognizer systems to cater for different applications based on different lexicon size. For small lexicons, the word recognizer system consists of a new feature extraction and a classifier, and for medium and large lexicons, the system includes a new feature extraction and lexicon reduction methods and a classifier. For the performance evaluation of the proposed methods, we use four different Farsi handwritten datasets such as Farshids‟ Legal amount, 198-Cities, Iranshahr, and IFN-AUT, which contained 45, 198, 503, and 1080 class-words, respectively. In addition, for comparison of the obtained results with the previous works, we need proper datasets used by prior researchers. AUT and IFN-AUT were applied previously. The AUT, which included 198 class-words, was not available, but a similar dataset, 198-Cities, was created by random selection of 198 class-words from Iranshahr dataset. In order to conduct more experiments based on different lexicon size, the proposed methods were run on Farshids‟ Legal amount and Iranshahr datasets as well. Moreover, we re-implemented the existing word recognizer and lexicon reduction method so that we could test for comparison using the same dataset such as 198-Cities and IFN-AUT. It might be concluded that our methods, which consist of a new feature extraction and lexicon reduction methods and the classifier, perform better than the latest works. Support vector machines Persian language - Written Persian APT (Computer program language) 2011-08 Thesis http://psasir.upm.edu.my/id/eprint/26987/ http://psasir.upm.edu.my/id/eprint/26987/1/FSKTM%202011%2021R.pdf application/pdf en public phd doctoral Universiti Putra Malaysia Support vector machines Persian language - Written Persian APT (Computer program language) Faculty of Computer Science and Information Technology
institution	Universiti Putra Malaysia
collection	PSAS Institutional Repository
language	English
topic	Support vector machines Persian language - Written Persian APT (Computer program language)
spellingShingle	Support vector machines Persian language - Written Persian APT (Computer program language) Akbarpour, Shahin Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
description	Automatic word recognition has proved an intensive research subject for many languages in the last decades, but it is still far from the final frontier for some languages. The word recognition is divided into two types: online and offline. The current research is focused on the offline handwritten word recognition (FHWR). An offline handwritten word recognition system includes many stages. All stages should be improved in order to enhance accuracy of the system. In addition, one of the most significant current discussions in enhancement of the accuracy of handwritten word recognition is reducing the lexicon size. Many studies have been carried out so far, but FHWR has not been researched as thoroughly as Latin or Chinese handwritten systems. Several attempts have been made to address FHWR, most of which focusing on the image preprocessing and segmentation. It is also worth mentioning that some studies have already been done on the feature extraction, classification and lexicon reduction methods. In the latest and the most successful prior studies, a feature extraction method, a lexicon reduction, and hidden Markov model (HMM) have been used. However, the recognition rate is not superior owing to the fact that the feature extraction method could not truly describe the Farsi word. Moreover, there exist some limitations in HMM, and several segmentation errors occurred in their lexicon reduction. The current research is focused on solving the mentioned problems through improving the accuracy of recognition rate of FHWR by proposing a new feature extraction and lexicon reduction methods, and finding a suitable classification. In this regard, some special attributes of Farsi manuscripts such as the stroke directions, non-unique black pixels distribution on binary image of the word, the number of the sub-word(s) and dot(s) of the word will be considered. In addition, several classification methods will be tested in order to determine which one is the best for better accuracy of recognition rate other than HMM. We developed two word recognizer systems to cater for different applications based on different lexicon size. For small lexicons, the word recognizer system consists of a new feature extraction and a classifier, and for medium and large lexicons, the system includes a new feature extraction and lexicon reduction methods and a classifier. For the performance evaluation of the proposed methods, we use four different Farsi handwritten datasets such as Farshids‟ Legal amount, 198-Cities, Iranshahr, and IFN-AUT, which contained 45, 198, 503, and 1080 class-words, respectively. In addition, for comparison of the obtained results with the previous works, we need proper datasets used by prior researchers. AUT and IFN-AUT were applied previously. The AUT, which included 198 class-words, was not available, but a similar dataset, 198-Cities, was created by random selection of 198 class-words from Iranshahr dataset. In order to conduct more experiments based on different lexicon size, the proposed methods were run on Farshids‟ Legal amount and Iranshahr datasets as well. Moreover, we re-implemented the existing word recognizer and lexicon reduction method so that we could test for comparison using the same dataset such as 198-Cities and IFN-AUT. It might be concluded that our methods, which consist of a new feature extraction and lexicon reduction methods and the classifier, perform better than the latest works.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Akbarpour, Shahin
author_facet	Akbarpour, Shahin
author_sort	Akbarpour, Shahin
title	Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
title_short	Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
title_full	Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
title_fullStr	Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
title_full_unstemmed	Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system
title_sort	improved feature extraction and lexicon reduction methods classified by support vector machine for farsi handwritten word recognition system
granting_institution	Universiti Putra Malaysia
granting_department	Faculty of Computer Science and Information Technology
publishDate	2011
url	http://psasir.upm.edu.my/id/eprint/26987/1/FSKTM%202011%2021R.pdf
_version_	1747811566371209216

Improved feature extraction and lexicon reduction methods classified by support vector machine for Farsi handwritten word recognition system

Similar Items