A stylometry approach for blind linguistic steganalysis model against translation-based steganography

Steganography is the art of hiding information in ways that prevent the detection of a secret message. In Translation-based Steganography (TBS), the secret messages are encoded in the “noise” made via translation of natural language text programmed. The adversarial technique to extract the secret me...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Lokman, Syiham
Format: Thesis
Language:English
English
English
Published: 2023
Subjects:
Online Access:http://eprints.uthm.edu.my/10995/1/24p%20SYIHAM%20MOHD%20LOKMAN.pdf
http://eprints.uthm.edu.my/10995/2/SYIHAM%20MOHD%20LOKMAN%20COPYRIGHT%20DECLARATION.pdf
http://eprints.uthm.edu.my/10995/3/SYIHAM%20MOHD%20LOKMAN%20WATERMARK.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Steganography is the art of hiding information in ways that prevent the detection of a secret message. In Translation-based Steganography (TBS), the secret messages are encoded in the “noise” made via translation of natural language text programmed. The adversarial technique to extract the secret message is called steganalysis, which can be categorized into two types; targeted vs. blind. While targeted steganalysis is designed to attack a specific embedding algorithm, blind steganalysis use features extracted or selection from the medium to detect any anomalies that indicate a possibility that a secret data has been embedded within the medium. However, accuracy of blind steganalysis algorithms highly depend on the features selected from the input data especially when attacking embedding techniques in TBS. This thesis explore the potential of using stylometry or linguistic style to improve the representation of characteristics among the word distribution in distinguishing the stego text from the cover text for TBS. This is because all translated in TBS text have an intrinsic structural styles that can be used to improve the performance of a blind steganalysis model. The proposed stylometry-based blind steganalysis model consists of two stages, which are stylometric feature selection and classification. The proposed stylometric features selected from a set of cover text are categorized into two group features; lexical and syntactic features before implemented into the model Support Vector Machine (SVM) as the classifier. The performance of the stylometry-based blind steganalysis model is then evaluated based on all false rate, missing rate and accuracy rate and compared against three other standard classifiers in steganalysis; Naive Bayes (NB), k-Nearest Neighbor (k-NN), and Decision Tree (J48). The results showed that the stylometric features are impactful to a blind steganalysis model by giving higher detection performance. Meanwhile, SVM is the best classifier for stego text detection with significantly low processing time performance