Arabic language script and encoding identification with support vector machines and rough set theory

Arabic is ranking sixth among the worldâ€™s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a sma...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Mohamed Sidya, Mohamed Ould
التنسيق:	أطروحة
اللغة:	English
منشور في:	2007
الموضوعات:	QA75 Electronic computers Computer science
الوصول للمادة أونلاين:	http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my-utm-ep.6795
record_format	uketd_dc
spelling	my-utm-ep.67952018-08-03T08:49:15Z Arabic language script and encoding identification with support vector machines and rough set theory 2007-11 Mohamed Sidya, Mohamed Ould QA75 Electronic computers. Computer science Arabic is ranking sixth among the worldâ€™s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a small number Arabic speakers use the internet, still it constitutes a considerable share to the internet community. Unfortunately, so far, there has been no research to automatically distinguish between the Arabic language and the other languages that use the same script. This project deals with identifying the Arabic language from the Persian language; both languages are written in the Arabic script. The data for this project has been collected from the internet, the BBC website in particular. Many operations have been applied to this data, including stop word removal and stemming. This project is established to compare the performance of Support Vector Machines with Rough Set Theory in Identifying the Arabic language. The results show that both methods perform well but the Support Vector Machines outperform the Rough Set Theory. 2007-11 Thesis http://eprints.utm.my/id/eprint/6795/ http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:62506 masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information System Faculty of Computer Science and Information System
institution	Universiti Teknologi Malaysia
collection	UTM Institutional Repository
language	English
topic	QA75 Electronic computers Computer science
spellingShingle	QA75 Electronic computers Computer science Mohamed Sidya, Mohamed Ould Arabic language script and encoding identification with support vector machines and rough set theory
description	Arabic is ranking sixth among the worldâ€™s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a small number Arabic speakers use the internet, still it constitutes a considerable share to the internet community. Unfortunately, so far, there has been no research to automatically distinguish between the Arabic language and the other languages that use the same script. This project deals with identifying the Arabic language from the Persian language; both languages are written in the Arabic script. The data for this project has been collected from the internet, the BBC website in particular. Many operations have been applied to this data, including stop word removal and stemming. This project is established to compare the performance of Support Vector Machines with Rough Set Theory in Identifying the Arabic language. The results show that both methods perform well but the Support Vector Machines outperform the Rough Set Theory.
format	Thesis
qualification_level	Master's degree
author	Mohamed Sidya, Mohamed Ould
author_facet	Mohamed Sidya, Mohamed Ould
author_sort	Mohamed Sidya, Mohamed Ould
title	Arabic language script and encoding identification with support vector machines and rough set theory
title_short	Arabic language script and encoding identification with support vector machines and rough set theory
title_full	Arabic language script and encoding identification with support vector machines and rough set theory
title_fullStr	Arabic language script and encoding identification with support vector machines and rough set theory
title_full_unstemmed	Arabic language script and encoding identification with support vector machines and rough set theory
title_sort	arabic language script and encoding identification with support vector machines and rough set theory
granting_institution	Universiti Teknologi Malaysia, Faculty of Computer Science and Information System
granting_department	Faculty of Computer Science and Information System
publishDate	2007
url	http://eprints.utm.my/id/eprint/6795/1/MohamedOuldMohamedSidyaMFSKSM2007.pdf
_version_	1747814688106741760

Arabic language script and encoding identification with support vector machines and rough set theory

مواد مشابهة