Improved semantic graph-based plagiarism detection

Plagiarism detection occurs when the content of a text is copied without permission or citation. Nowadays, many text documents on the internet are easily copied and accessed. This study proposed improved methods to handle plagiarism. The proposed plagiarism detection methods are developed using grap...

Full description

Saved in:
Bibliographic Details
Main Author: Osman Ahmed, Ahmed Hamza
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/33795/5/AhmedHamzaOsmanPFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.33795
record_format uketd_dc
spelling my-utm-ep.337952017-07-23T04:27:17Z Improved semantic graph-based plagiarism detection 2013-03 Osman Ahmed, Ahmed Hamza PN Literature (General) Plagiarism detection occurs when the content of a text is copied without permission or citation. Nowadays, many text documents on the internet are easily copied and accessed. This study proposed improved methods to handle plagiarism. The proposed plagiarism detection methods are developed using graph-based representation and semantic role labeling which are improved using fuzzy logic technique and chi-squared automatic interaction detection. The graph-based method does not only represent the content of a text document as a graph, but also captures the underlying semantic meaning in terms of the relationships among its concepts. Semantic role labeling is superior in generating semantic arguments for each sentence. This semantic role labeling plays an important part in plagiarism detection as it segments the role of concepts in documents to labels which are compared and used to detect plagiarism. Scoring for each argument generated by the fuzzy logic method to select important arguments is also another feature of this study. Chisquared Automatic Interaction Detection technique was applied to enforce the results obtained from the fuzzy logic and semantic role labeling by selecting important arguments from the sentences. It is concluded that not all arguments in the text are useful in the plagiarism detection process. Therefore, only the most important arguments were selected by the fuzzy logic and Chi-squared automatic interaction detection, and the results were used in the similarity calculation process. Experiments were tested on the PAN-PC-2009 for standard artificial simulation corpus and the Short Answers Questions (CS11) for human simulation corpus in plagiarism detection. The proposed methods detected many types of plagiarisms, such as copy paste plagiarism, rewording or synonym replacement, changing of word structure in the sentences, modifying the sentence from passive voice to active voice and vice-versa. Results from the experiments using the proposed methods in comparison to other palagiarism detection techniques (Fuzzy Semantic-Based String Similarity and Longest Common Subsequence) achieved better performance in terms of recall (93%), precision (90%) and f-measure (91%). 2013-03 Thesis http://eprints.utm.my/id/eprint/33795/ http://eprints.utm.my/id/eprint/33795/5/AhmedHamzaOsmanPFSKSM2013.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:69876?site_name=Restricted Repository phd doctoral Universiti Teknologi Malaysia, Faculty of Computing Faculty of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic PN Literature (General)
spellingShingle PN Literature (General)
Osman Ahmed, Ahmed Hamza
Improved semantic graph-based plagiarism detection
description Plagiarism detection occurs when the content of a text is copied without permission or citation. Nowadays, many text documents on the internet are easily copied and accessed. This study proposed improved methods to handle plagiarism. The proposed plagiarism detection methods are developed using graph-based representation and semantic role labeling which are improved using fuzzy logic technique and chi-squared automatic interaction detection. The graph-based method does not only represent the content of a text document as a graph, but also captures the underlying semantic meaning in terms of the relationships among its concepts. Semantic role labeling is superior in generating semantic arguments for each sentence. This semantic role labeling plays an important part in plagiarism detection as it segments the role of concepts in documents to labels which are compared and used to detect plagiarism. Scoring for each argument generated by the fuzzy logic method to select important arguments is also another feature of this study. Chisquared Automatic Interaction Detection technique was applied to enforce the results obtained from the fuzzy logic and semantic role labeling by selecting important arguments from the sentences. It is concluded that not all arguments in the text are useful in the plagiarism detection process. Therefore, only the most important arguments were selected by the fuzzy logic and Chi-squared automatic interaction detection, and the results were used in the similarity calculation process. Experiments were tested on the PAN-PC-2009 for standard artificial simulation corpus and the Short Answers Questions (CS11) for human simulation corpus in plagiarism detection. The proposed methods detected many types of plagiarisms, such as copy paste plagiarism, rewording or synonym replacement, changing of word structure in the sentences, modifying the sentence from passive voice to active voice and vice-versa. Results from the experiments using the proposed methods in comparison to other palagiarism detection techniques (Fuzzy Semantic-Based String Similarity and Longest Common Subsequence) achieved better performance in terms of recall (93%), precision (90%) and f-measure (91%).
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Osman Ahmed, Ahmed Hamza
author_facet Osman Ahmed, Ahmed Hamza
author_sort Osman Ahmed, Ahmed Hamza
title Improved semantic graph-based plagiarism detection
title_short Improved semantic graph-based plagiarism detection
title_full Improved semantic graph-based plagiarism detection
title_fullStr Improved semantic graph-based plagiarism detection
title_full_unstemmed Improved semantic graph-based plagiarism detection
title_sort improved semantic graph-based plagiarism detection
granting_institution Universiti Teknologi Malaysia, Faculty of Computing
granting_department Faculty of Computing
publishDate 2013
url http://eprints.utm.my/id/eprint/33795/5/AhmedHamzaOsmanPFSKSM2013.pdf
_version_ 1747816187011530752