Islamic web pages filtering and categorization

The Internet creates the world without boundaries where people can get lots of information just by surfing the Internet. But still some of the information is not genuine and correct. Because of that, some of the practitioners of deviant teachings can take this opportunity to attract followers just u...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd. Zamry, Nurfazrina
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/35863/5/NurFazrinaMohdZamryMFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Internet creates the world without boundaries where people can get lots of information just by surfing the Internet. But still some of the information is not genuine and correct. Because of that, some of the practitioners of deviant teachings can take this opportunity to attract followers just using the Internet especially to distort beliefs of Muslim in Malaysia. Web filtering can be used as protection against inappropriate and prevention of misuse of the network, hence, it can be used to filter the content of suspicious websites and alleviate the dissemination of such website. Currently, process for blocking the deviate teaching website is done manually and in addition there are limited web filtering product offered to filter religion content and very limited for Malay language. This project is aim to classify deviant teachings Website into three categories which is deviate, suspicious and clean. Pre-processing, feature selection and classification are process involved in Web filtering process. In pre-processing three processes are involved: HTML parsing, stemming and stopping to produce the deviant teaching keyword. Three existing term weighting scheme namely TF, TFIDF and Modified Entropy are used as feature selection process in filtering deviant teaching website while Support Vector Machine (SVM) will be used for classification process. Classification is validated by accuracy, precision, recall and F1. 300 Web pages were collected from Internet based on three categories: deviant teaching, suspicious and clean Web pages. As a result, M.Entropy shows the most suitable term weighting scheme to use in Islamic web pages filtering rather than TFIDF and Entropy.