Arabic language script and encoding identification with support vector machines and rough set theory

Arabic is ranking sixth among the world’s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a sma...

Full description

Saved in:
Bibliographic Details
Main Author: Mohamed Sidya, Mohamed Ould
Format: Thesis
Language:English
Published: 2007
Subjects:
Online Access:http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.6795
record_format uketd_dc
spelling my-utm-ep.67952018-08-03T08:49:15Z Arabic language script and encoding identification with support vector machines and rough set theory 2007-11 Mohamed Sidya, Mohamed Ould QA75 Electronic computers. Computer science Arabic is ranking sixth among the world’s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a small number Arabic speakers use the internet, still it constitutes a considerable share to the internet community. Unfortunately, so far, there has been no research to automatically distinguish between the Arabic language and the other languages that use the same script. This project deals with identifying the Arabic language from the Persian language; both languages are written in the Arabic script. The data for this project has been collected from the internet, the BBC website in particular. Many operations have been applied to this data, including stop word removal and stemming. This project is established to compare the performance of Support Vector Machines with Rough Set Theory in Identifying the Arabic language. The results show that both methods perform well but the Support Vector Machines outperform the Rough Set Theory. 2007-11 Thesis http://eprints.utm.my/id/eprint/6795/ http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:62506 masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information System Faculty of Computer Science and Information System
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Mohamed Sidya, Mohamed Ould
Arabic language script and encoding identification with support vector machines and rough set theory
description Arabic is ranking sixth among the world’s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a small number Arabic speakers use the internet, still it constitutes a considerable share to the internet community. Unfortunately, so far, there has been no research to automatically distinguish between the Arabic language and the other languages that use the same script. This project deals with identifying the Arabic language from the Persian language; both languages are written in the Arabic script. The data for this project has been collected from the internet, the BBC website in particular. Many operations have been applied to this data, including stop word removal and stemming. This project is established to compare the performance of Support Vector Machines with Rough Set Theory in Identifying the Arabic language. The results show that both methods perform well but the Support Vector Machines outperform the Rough Set Theory.
format Thesis
qualification_level Master's degree
author Mohamed Sidya, Mohamed Ould
author_facet Mohamed Sidya, Mohamed Ould
author_sort Mohamed Sidya, Mohamed Ould
title Arabic language script and encoding identification with support vector machines and rough set theory
title_short Arabic language script and encoding identification with support vector machines and rough set theory
title_full Arabic language script and encoding identification with support vector machines and rough set theory
title_fullStr Arabic language script and encoding identification with support vector machines and rough set theory
title_full_unstemmed Arabic language script and encoding identification with support vector machines and rough set theory
title_sort arabic language script and encoding identification with support vector machines and rough set theory
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information System
granting_department Faculty of Computer Science and Information System
publishDate 2007
url http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf
_version_ 1747814688106741760