Document plagiarism detection algorithm using semantic networks

The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of...

Full description

Saved in:
Bibliographic Details
Main Author: Ahmed Muftah, Ahmed Jabr
Format: Thesis
Language:English
Published: 2009
Subjects:
Online Access:http://eprints.utm.my/id/eprint/11433/6/AhmedJabrAhmedMFSKSM2009.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.11433
record_format uketd_dc
spelling my-utm-ep.114332017-09-14T03:58:43Z Document plagiarism detection algorithm using semantic networks 2009-11 Ahmed Muftah, Ahmed Jabr QA75 Electronic computers. Computer science The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized. 2009-11 Thesis http://eprints.utm.my/id/eprint/11433/ http://eprints.utm.my/id/eprint/11433/6/AhmedJabrAhmedMFSKSM2009.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:71532?site_name=Restricted Repository masters Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems Faculty of Computer Science and Information System
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Ahmed Muftah, Ahmed Jabr
Document plagiarism detection algorithm using semantic networks
description The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized.
format Thesis
qualification_level Master's degree
author Ahmed Muftah, Ahmed Jabr
author_facet Ahmed Muftah, Ahmed Jabr
author_sort Ahmed Muftah, Ahmed Jabr
title Document plagiarism detection algorithm using semantic networks
title_short Document plagiarism detection algorithm using semantic networks
title_full Document plagiarism detection algorithm using semantic networks
title_fullStr Document plagiarism detection algorithm using semantic networks
title_full_unstemmed Document plagiarism detection algorithm using semantic networks
title_sort document plagiarism detection algorithm using semantic networks
granting_institution Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems
granting_department Faculty of Computer Science and Information System
publishDate 2009
url http://eprints.utm.my/id/eprint/11433/6/AhmedJabrAhmedMFSKSM2009.pdf
_version_ 1747814854669893632