Author Identification of English Tweets for Social Media Forensics
Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases amo...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2023
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my-unimas-ir.43079 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Malaysia Sarawak |
collection |
UNIMAS Institutional Repository |
language |
English English English |
topic |
QA75 Electronic computers Computer science |
spellingShingle |
QA75 Electronic computers Computer science Nursyahirah, Tarmizi Author Identification of English Tweets for Social Media Forensics |
description |
Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases among the social media users. AI plays vital role in social media forensics (SMF) to unveil the true identity of the cyberbullying perpetrator from the OSN text. However, OSN text has been an open problem in AI as the limited length of the text and the usage of Internet jargon affecting the performance of AI system. In this research, AI task is conducted to facilitate the SMF activity by analysing the writing style of tweets from Twitter in identifying most plausible author for anonymized tweet. The writing style of the author or known as the stylometric features including character n-grams, word n-grams and Part-of-Speech (POS) n-grams are extracted from the text. These features are used widely in identifying the author of short text as they are language independent and tolerant of grammatical errors. The features are represented using different text representation models namely TF-IDF and Embedding model. The models are examined to compare which one could best represent the OSN text. For classification, machine learning and deep learning are used to evaluate the classification model by maintaining the optimum performance of AI system. The findings shown that Twitter native features are very useful in boosting the performance of AI system. Embedding-based model achieved better performance in representing n-grams with fix and distributed representation. The best result was achieved when CNN mix with embedding-based model with accuracy of 95.02% for English and 94% for KadazanDusun and both 95 % precision for both languages. |
format |
Thesis |
qualification_level |
Master's degree |
author |
Nursyahirah, Tarmizi |
author_facet |
Nursyahirah, Tarmizi |
author_sort |
Nursyahirah, Tarmizi |
title |
Author Identification of English Tweets for Social Media Forensics |
title_short |
Author Identification of English Tweets for Social Media Forensics |
title_full |
Author Identification of English Tweets for Social Media Forensics |
title_fullStr |
Author Identification of English Tweets for Social Media Forensics |
title_full_unstemmed |
Author Identification of English Tweets for Social Media Forensics |
title_sort |
author identification of english tweets for social media forensics |
granting_institution |
Universiti Malaysia Sarawak |
granting_department |
Information System |
publishDate |
2023 |
url |
http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf |
_version_ |
1794023043007250432 |
spelling |
my-unimas-ir.430792024-02-20T04:57:23Z Author Identification of English Tweets for Social Media Forensics 2023-08-24 Nursyahirah, Tarmizi QA75 Electronic computers. Computer science Authorship Identification (AI) is the process of determining the most likely author of a given text by analysing writing style characteristics and linguistic patterns. Identifying the author of online social network (OSN) text becomes a pressing issue nowadays as the increase of cyberbully cases among the social media users. AI plays vital role in social media forensics (SMF) to unveil the true identity of the cyberbullying perpetrator from the OSN text. However, OSN text has been an open problem in AI as the limited length of the text and the usage of Internet jargon affecting the performance of AI system. In this research, AI task is conducted to facilitate the SMF activity by analysing the writing style of tweets from Twitter in identifying most plausible author for anonymized tweet. The writing style of the author or known as the stylometric features including character n-grams, word n-grams and Part-of-Speech (POS) n-grams are extracted from the text. These features are used widely in identifying the author of short text as they are language independent and tolerant of grammatical errors. The features are represented using different text representation models namely TF-IDF and Embedding model. The models are examined to compare which one could best represent the OSN text. For classification, machine learning and deep learning are used to evaluate the classification model by maintaining the optimum performance of AI system. The findings shown that Twitter native features are very useful in boosting the performance of AI system. Embedding-based model achieved better performance in representing n-grams with fix and distributed representation. The best result was achieved when CNN mix with embedding-based model with accuracy of 95.02% for English and 94% for KadazanDusun and both 95 % precision for both languages. Universiti Malaysia Sarawak 2023-08 Thesis http://ir.unimas.my/id/eprint/43079/ http://ir.unimas.my/id/eprint/43079/7/Nursyahirah%20Tarmizi_dsva.pdf text en staffonly http://ir.unimas.my/id/eprint/43079/8/Thesis%20Master_Nursyahirah%20Binti%20Tarmizi%20-%2024%20pages.pdf text en public http://ir.unimas.my/id/eprint/43079/11/Nursyahirah%20ft.pdf text en validuser masters Universiti Malaysia Sarawak Information System Abbasi, A., & Chen, H. (2005). Applying authorship analysis to extremist-group web forum messages. IEEE Intelligent Systems, 67-75. Abbasi, A., Javed, A. R., Iqbal, F., Jalil, Z., Gadekallu, T. R., & Kryvinska, N. (2022). Authorship identification using ensemble learning. Scientific Reports, 1-16. Abdullah Al-Ajlan, M., & Ykhlef, M. (2018). Deep learning algorithm for cyberbullying detection. International Journal of Advanced Computer Science and Applications, 199-205. Altakrori, H., M., Iqbal, F., Fung, B. C., Ding, S. H., & Tubaishat., A. (2018). Arabic authorship attribution: An extensive study on twitter posts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 1-51. Apin, P., & Wahab, K. A. (2015). Tabu bahasa dalam masyarakat Dusun di Daerah Ranau, Sabah. Jurnal Melayu, 224-239. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119-123. Ariffin, A., Mohd, N., & Rokanatnam, T. (2021). Cyberbullying via Social Media: Case Studies in Malaysia. OIC-CERT Journal of Cyber Security, 21-30. Azarbonyad, H., Dehghani, M., Marx, M., & Kamps, J. (2015). Time-Aware Authorship Attribution for Short Text Streams. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 727-730). Santiago: ACM. Baig, A. R., & Kayani, H. M. (2015). Swarm intelligence based author identification for digital typewritten text. In 2015 First International Conference on Anti-Cybercrime (ICACC) (pp. 1-6). Riyadh: IEEE. Balakrishnan, V., Khan, S., & Arabnia, H. R. (2020). Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Computers & Security, 101710. Banga, R., & Mehndiratta, P. (2017). Authorship attribution for textual data on online social networks. 2017 Tenth International Conference on Contemporary Computing (IC3). Noida: IEEE. Barlas, G., & Stamatatos, E. (2021). A transfer learning approach to cross domain authorship attribution. Evolving Systems, 625-643. Bhargava, M., Mehndiratta, P., & Asawa, K. (2013). Stylometric analysis for authorship attribution on twitter. In International Conference on Big Data Analytics (pp. 37-47). Cham: Springer. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 135–146. bullyingstatistic. (7 July, 2015). Cyber Bullying Statistics. Retrieved from Bullying Statistics: http://www.bullyingstatistics.org/content/cyber-bullying-statistics.html Carvalho, V. F., Nascimento, C., & Nogueira, B. M. (2020). Machine Learning for Suicidal Ideation Identification on Twitter for the Portuguese Language. In Brazilian Conference on Intelligent Systems (pp. 536-550). Cham: Springer. Chaski, C. (2005). Who’s at the keyboard? Authorship attribution in digital evidence investigations. International Journal of Digital Evidence, 1-13. Chaski, C. (2007). The keyboard dilemma and authorship identification. In IFIP International Conference on Digital Forensics (pp. 133-146). New York: Springer. Cheng, L., Guo, R., Silva, Y., Hall, D., & Liu, H. (2019). Hierarchical attention networks for cyberbullying detection on the instagram social network. In Proceedings of the 2019 SIAM International Conference on Data Mining (pp. 235-243). Calgary: Society for Industrial and Applied Mathematics. Chowdhury, H. A., Imon, M. A., Hasnayeen, S. M., & Islam, M. S. (2019). Authorship Attribution in Bengali Literature using Convolutional Neural Networks with fastText’s word embedding model. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) (pp. 1-5). Dhaka: IEEE. Corney, M. W., Anderson, A. M., Mohay, G. M., & Vel, O. d. (2001). Identifying the authors of suspect email. Communications of the ACM. De Vel, O., Anderson, A., Corney, M., & Mohay, G. (2001). Mining e-mail content for author identification forensics. ACM Sigmod Record, 55-64. Developers. (18 July, 2022). Embeddings. Retrieved from Developers: https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture Dharma, E. M., Gaol, F. L., Leslie, H., Warnars, H. S., & Soewito, B. (2022). The accuracy comparison among word2vec, glove, and fasttext towards convolution neural network (cnn) text classification. Journal of Theoretical and Applied Information Technology, 31. Diurdeva, P., Mikhailova, E., & Shalymov, D. (2016). Writer identification based on letter frequency distribution. In 2016 19th Conference of Open Innovations Association (FRUCT) (pp. 24-30). 2016: IEEE. Domo. (2021). DOMO. Retrieved from Data Never Sleeps 9.0: https://www.domo.com/learn/infographic/data-never-sleeps-9 Doultani, M. A., & Vijayalakshmi, M. (2019). Data Forensics On Social Media. In 2019 IEEE 5th International Conference for Convergence in Technology (I2CT) (pp. 1-5). Bombay: IEEE. Eder, M. (2015). Does size matter? Authorship attribution, small samples, big problem. Digital Scholarship in the Humanities, 30(2), 167-182. Eyuboglu, M., Eyuboglu, D., Pala, S. C., Oktar, D., Demirtas, Z., Arslantas, D., & Unsal, A. (2021). Traditional school bullying and cyberbullying: Prevalence, the effect on mental health problems and self-harm behavior. Psychiatry Research, 113730. Ferreira, M. J. (2017). Workflow Recommendation for Text Classification Problems. Porto: University Do Porto. Fourkioti, O., Symeonidis, S., & Arampatzis, A. (2019). Language Models and Fusion for Authorship Attribution. Information Processing & Management, 102061. Frommholz, I., al-Khateeb, H. M., Potthast, M., Ghasem, Z., Shukla, M., & Short, E. (2016). On Textual Analysis and Machine Learning for Cyberstalking Detection. Datenbank Spektrum, 16(2), 127–135. Gladwin, A. A., Lavin, M. J., & Look, D. M. (2017). Stylometry and collaborative authorship: Eddy, lovecraft, and ‘The Loved Dead’. Digital Scholarship in the Humanities, 32(1), 123-140. Gómez-Adorno, H., Alemán, Y., Vilariño, D., Sanchez-Perez, M. A., Pinto, D., & Sidorov., G. (2017). Author clustering using hierarchical Clustering analysis: Notebook for PAN at CLEF 2017. In CEUR Workshop Proceedings (Vol. 1866) (pp. 1-7). CEUR-WS. Goni, O., Ali, M. H., Showrov, Alam, M. M., & Shameem, M. A. (2022). The Basic Concept of Cyber Crime. Journal of Technology Innovations and Energy, 16-24. Green, R. M., & Sheppard, J. W. (2013). Comparing Frequency- and Style-Based Features for Twitter Author Identification. Proceedings of the Twenty-Sixth International Florida Artificial Intelligence Research Society Conference (pp. 64-69). Association for the Advancement of Artificial. Hamad, N., & Eleyan, D. (2022). Digital Forensics Tools Used in Cybercrime Investigation – Comparative Analysis. Journal of Xi'an University of Architecture & Technology, 113-126. Han, Z., Wang, Z., & Li, Y. (2021). Cyberbullying involvement, resilient coping, and loneliness of adolescents during Covid-19 in rural China. Frontiers in Psychology, 2275. Hewling, M. O. (2009). Digital forensics: an integrated approach for the investigation of cyber/computer related crimes. Luton: University of Bedfordshire. Hitschler, J., Berg, E. V., & Rehbein, I. (2018). Authorship attribution with convolutional neural networks and POS-eliding. In Proceedings of the Workshop on Stylistic Variation (EMNLP 2017) (pp. 53-28). Copenhagen: The Association for Computational Linguistics. Hon, L. C., & Varathan, K. D. (2015). Cyberbullying Detection System on Twitter. In International Journal of Information Systems and Engineering (IJISE), 1(1), 1-11. Howedi, F., & Mohd, M. (2014). Text classification for authorship attribution using Naive Bayes classifier with limited training data. Computer Engineering and Intelligent Systems, 5(4), 48-56. Huang, W., Su, R., & Iwaihara, M. (2020). Contribution of improved character embedding and latent posting styles to authorship attribution of short texts. Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (pp. 261-269). Cham: Springer. Intuition Engineering. (25 February, 2019). Chars2vec: character-based language model for handling real world texts with spelling errors. Retrieved from Hackernoon: https://hackernoon.com/chars2vec-character-based-language-model-for-handling-real-world-texts-with-spelling-errors-and-a3e4053a147d Jambi, K. M., Khan, I. H., Siddiqui, M. A., & Alhaj, S. O. (2021). Towards Authorship Attribution in Arabic Short-Microblog Text. IEEE Access, 128506-128520. Juola, P. (2008). Authorship attribution. Foundations and Trends in Information Retrieval, 233-334. Kaikini, S., Agarwal, K., Raj, K., & Kulshrestha, S. (2022). Statics on Cyber Crimes and Cyber Laws in India: A Study. Journal of Algebraic Statistics, 2935-2944. Kebede, A. M., Tefrie, K. G., & Sohn, K.-A. (2015). Anonymous author similarity identification. In 2015 5th International Conference on IT Convergence and Security (ICITCS) (pp. 1-5). Kuala Lumpur: IEEE. Khanh, B., & Vorobeva, A. (2020). A preliminary performance comparison of machine learning algorithms for web author identification of vietnamese online messages. 2020 26th Conference of Open Innovations Association (FRUCT) (pp. 166-173). Yaroslavl: IEEE. Khatun, A., Rahman, A., Islam, M. S., & Marium-E-Jannat. (2019). Authorship Attribution in Bangla literature using Character-Level CNN. 2019 22nd International Conference on Computer and Information Technology (ICCIT) (pp. 1-5). Dhaka: IEEE. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401-412. Koppel, M., Schler, J., & Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60(1), 9-26. Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 150. Le, Q., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on International Conference on Machine Learning (pp. 1188-1196). Beijing: ACM. Liu, D. (2009). Cisco router and switch forensics: Investigating and analyzing malicious network activity. Syngress. Loades, M. E., Chatburn, E., Higson-Sweeney, N., Reynolds, S., Shafran, R., Brigden, A., Crawley, E. (2020). Rapid systematic review: the impact of social isolation and loneliness on the mental health of children and adolescents in the context of COVID-19. Journal of the American Academy of Child & Adolescent Psychiatry, 1218-1239. Mahajan, D., Patil, R., & Sankar, V. (2018). Word2Vec using Character n-grams. CA: Stanford. Markov, I. J.-L. (2017). Authorship attribution in portuguese using character n-grams. Acta Polytechnica Hungarica, 14(3), 59-78. Meijer, H. J., Truong, J., & Karimi, R. (2021). Document embedding for scientific articles: Efficacy of word embeddings vs TFIDF. arXiv preprint arXiv:2107.05151, 1-12. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781, 1-12. Montasari, R., Hill, R., Carpenter, V., & Montaseri, F. (2019). Digital forensic investigation of social media, acquisition and analysis of digital evidence. International Journal of Strategic Engineering (IJoSE), 52-60. Moore, M. J., Nakano, T., Enomoto, A., & Suda, T. (2012). Anonymity and roles associated with aggressive posts in an online forum. Computers in Human Behavior, 861-867. Mulazzani, M. (2014). New challenges in digital forensics: online storage and anonymous communication. [Doctoral dissertation, Vienna University of Technology]. Retrieved from https://repositum.tuwien.at/handle/20.500.12708/5835 Ngejane, C., Eloff, J., Sefara, T., & Marivate, V. (2021). Digital forensics supported by machine learning for the detection of online sexual predatory chats. Forensic Science International: Digital Investigation, 36, 301109. O'Day, D. R., & Calix, R. A. (2013). Text message corpus: Applying natural language processing to mobile device forensics. 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (pp. 1-6). California: IEEE. Okuno, S., Asai, H., & Yamana, H. (2015). A challenge of authorship identification for ten-thousand-scale microblog users. 2014 IEEE International Conference on Big Data (Big Data) (pp. 52-54). Washington, DC: IEEE. Omar, A. H. (2014). Processing Malaysian Indigenous Languages: A Focus on Phonology and Grammar. Open Journal of Modern Linguistics, 728-738. Orebaugh, A., & Allnutt, J. (2009). Classification of instant messaging communications for forensics analysis. The International Journal of Forensic Computer Science, 22-28. Park, M. S.-A., Golden, K. J., Vizcaino-Vickers, S., Jidong, D., & Raj, S. (2021). Sociocultural Values, Attitudes and Risk Factors Associated With Adolescent Cyberbullying in East Asia: A Systematic Review. Journal of Psychosocial Research on Cyberspace. doi:https://doi.org/10.5817/CP2021-1-5 Pawar, R., Agrawal, Y., Joshi, A., Gorrepati, R., & Raje, R. R. (2018). Cyberbullying detection system with multiple server configurations. In 2018 IEEE International Conference on Electro/Information Technology (EIT) (pp. 90-95). IEEE: Rochester. Perera, A., & Fernando, P. (2021). Accurate cyberbullying detection and prevention on social media. Procedia Computer Science, 181, 605-611. Posadas-Durán, J.-P., Gómez-Adorno, H., Sidorov, G., Batyrshin, I., Pinto, D., & Chanona-Hernández, L. (2017). Application of the distributed document representation in the authorship attribution task for small corpora. Soft Computing 21, 627–639. Potthast, M., Rosso, P., Stamatatos, E., & Stein, B. (2019). A Decade of Shared Tasks in Digital Text Forensics at PAN. Lecture Notes in Computer Science, 291-300. Punchihewa, M., Rajapaksha, C., & Asanka, D. (2021). A Language Modelling Approach to Authorship Identification for Online Examinations in Sinhala. 2021 International Conference on Advanced Research in Computing (ICARC-2021) (pp. 66-69). Belihuloya, Sri Lanka: IEEE. Rocha, A., Scheirer, W. J., Forstall, C. W., Cavalcante, T., Theophilo, A., Shen, B., Stamatatos, E. (2017). Authorship Attribution for Social Media Forensics. IEEE Transactions on Information Forensics and Security, 5-33. Sachowski, J. (2019). Implementing digital forensic readiness: From reactive to proactive process. Massachusetts: Elsivier. Sari, Y., Stevenson, M., & Vlachos, A. (2018). Topic or Style? Exploring the Most Useful Features for Authorship Attribution. Proceedings of the 27th International Conference on Computational Linguistics (pp. 343-353). New Mexico: ACL. Sarwar, R., & Hassan, S.-U. (2021). UrduAI: Writeprints for Urdu authorship identification. ACM Transactions on Asian and Low-Resource Language Information Processing, 1-18. Savoy, J. (2020). Machine Learning Models. Machine Learning Methods for Stylometry, 109-151. Schwartz, R., Tsur, O., Rappoport, A., & Koppel, M. (2013). Authorship Attribution of Micro-Messages. Proceedings of the 2013 on Empirical Methods in Natural Language Processing (EMNLP 2013) (pp. 1880 - 1891). Seattle: ACL. Shrestha, P., Sierra, S., González, F. A., Rosso, P., Montes-y-Gómez, M., & Solorio, T. (2017). Convolutional Neural Networks for Authorship Attribution of Short Texts. 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 669-674). Spain: Association for Computational Linguistics. Soler-Company, J., & Wanner, L. (2017). On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 681–687). Valencia, Spain: Association for Computational Linguistics. Stamatatos, E. (2008). Author Identification: Using Text Sampling to Handle the Class Imbalance Problem. Information Processing and Management, 790-799. Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology, 538-556. Stamatatos, E. (2017). Authorship attribution using text distortion. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (pp. 1138-1149). Spain: ACL. Theophilo, A., Giot, R., & Rocha, A. (2021). Authorship Attribution of Social Media Messages. IEEE Transactions on Computational Social Systems, 1-14. Theóphilo, A., Pereira, L. A., & Rocha, A. (2019). needle in a haystack? harnessing onomatopoeia and user-specific stylometrics for authorship attribution of micro-messages. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2692-2696). Brighton: IEEE. Theóphilo, A., Pereira, L. A., & Rocha, a. A. (2019). A Needle in a Haystack? Harnessing Onomatopoeia and User-specific Stylometrics for Authorship Attribution of Micro-messages. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2692-2696). Brighton: IEEE. Bhargava, M., Mehndiratta, P., & Asawa, K. (2013). Stylometric analysis for authorship attribution on twitter. In International Conference on Big Data Analytics (pp. 37-47). Cham: Springer. Vajjala, S., & Banerjee, S. (2017). A study of N-gram and Embedding Representations for Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 240-248). Copenhagen: Association for Computational Linguistics. Yülüce, İ., & Dalkılıç, F. (2022). Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies, 45-50. Zainudin, N. M., Zainal, K. H., Hasbullah, N. A., Wahab, N. A., & Ramli, S. (2016). A review on cyberbullying in Malaysia from digital forensic perspective. 2016 International Conference on Information and Communication Technology (ICICTM) (pp. 246-250). Kuala Lumpur: IEEE. Zhang, R., Hu, Z., Guo, H., & Mao, a. Y. (2018). Syntax encoding with application in authorship attribution. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 2742-2753). Brussels: Association for Computational Linguistics. Zheng, R., Yi Qin, Z. H., & Chen, H. (2003). Authorship analysis in cybercrime investigation. In International Conference on Intelligence and Security Informatics (pp. 59-73). Berlin: Springer. |