Document plagiarism detection algorithm using semantic networks

The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of...

全面介绍

Saved in:
书目详细资料
主要作者: Ahmed Muftah, Ahmed Jabr
格式: Thesis
语言:English
出版: 2009
主题:
在线阅读:http://eprints.utm.my/id/eprint/11433/6/AhmedJabrAhmedMFSKSM2009.pdf
标签: 添加标签
没有标签, 成为第一个标记此记录!
实物特征
总结:The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized.