Bar chart plagiarism detection

Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. T...

Full description

Saved in:
Bibliographic Details
Main Author: Mohammed Salih, Mohammed Mumtaz
Format: Thesis
Language:English
Published: 2013
Subjects:
Online Access:http://eprints.utm.my/id/eprint/39164/1/MohammedMumtazMohammedSalihMFSKSM2013.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. The extraction of features of bar chart is an essential process to get the data from images. Some techniques presented by researchers focused on the graphical part rather than text itself, such as Hough Transform and Learning Based method. In this study, ten features of bar chart images are utilized to detect and find the proportion of similarity between the charts. Some of these features can be directly extracted by OCR, while others demand finding the relationship between the text part and the graphic part to extract the data such as the real values for each bar in images. The new technique which introduced in this research can extract three values of each bar namely Start, End and Exact values depending on horizontal and vertical lines of the bar chart image. In addition, the Word 2-gram and Euclidean distance methods are used to detect and find the plagiarism. Experimental results show the ability of the system to detect plagiarism for ten possible patterns of bar chart plagiarisms. The performance of the system is evaluated depending on overlapping features and precision and recall. The experimental results show the ability of the system to detect not only copy and paste data of bars, but also restructuring and summarization of captions of image as well as modifications to data of bar chart images, such as swapping among bars, changing colors and changing scales of bar chart images.