Proper noun detection using regex algorithm and rules for malay named entity recognition
This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorith...
Saved in:
Main Author: | |
---|---|
Format: | thesis |
Language: | eng |
Published: |
2018
|
Subjects: | |
Online Access: | https://ir.upsi.edu.my/detailsg.php?det=5380 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
oai:ir.upsi.edu.my:5380 |
---|---|
record_format |
uketd_dc |
institution |
Universiti Pendidikan Sultan Idris |
collection |
UPSI Digital Repository |
language |
eng |
topic |
QA Mathematics |
spellingShingle |
QA Mathematics Farid Morsidi Proper noun detection using regex algorithm and rules for malay named entity recognition |
description |
This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorithm and rule were introduced in this study toovercome the limitation of dictionary and gazetteer. Two visualization techniques namely asDecision Tree and Term Document Matrix had been used to evaluate the efficiency of themethod. The result obtained 74% of accuracy during the generation of decision tree. Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, theregex algorithm could indicate the presence of Malay proper noun, thus making it an appropriatemethod for extraction tool to cluster and classify Malay proper noun. The study implicates thatthe use of Malay proper noun detection method can increase the effectiveness in namedentity recognition and beneficial to improve document retrieval for Malaylanguage. |
format |
thesis |
qualification_name |
|
qualification_level |
Master's degree |
author |
Farid Morsidi |
author_facet |
Farid Morsidi |
author_sort |
Farid Morsidi |
title |
Proper noun detection using regex algorithm and rules for malay named entity recognition |
title_short |
Proper noun detection using regex algorithm and rules for malay named entity recognition |
title_full |
Proper noun detection using regex algorithm and rules for malay named entity recognition |
title_fullStr |
Proper noun detection using regex algorithm and rules for malay named entity recognition |
title_full_unstemmed |
Proper noun detection using regex algorithm and rules for malay named entity recognition |
title_sort |
proper noun detection using regex algorithm and rules for malay named entity recognition |
granting_institution |
Universiti Pendidikan Sultan Idris |
granting_department |
Fakulti Seni, Komputeran dan Industri Kreatif |
publishDate |
2018 |
url |
https://ir.upsi.edu.my/detailsg.php?det=5380 |
_version_ |
1747833187559538688 |
spelling |
oai:ir.upsi.edu.my:53802020-11-19 Proper noun detection using regex algorithm and rules for malay named entity recognition 2018 Farid Morsidi QA Mathematics This study was aimed to develop a Malay proper noun detection method to cluster andclassify named entity categories, particularly for major important classes such asperson, location, organization, and miscellaneous for Malay newspaper corpus. RegularExpression pattern identification (regex) algorithm and rule were introduced in this study toovercome the limitation of dictionary and gazetteer. Two visualization techniques namely asDecision Tree and Term Document Matrix had been used to evaluate the efficiency of themethod. The result obtained 74% of accuracy during the generation of decision tree. Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, theregex algorithm could indicate the presence of Malay proper noun, thus making it an appropriatemethod for extraction tool to cluster and classify Malay proper noun. The study implicates thatthe use of Malay proper noun detection method can increase the effectiveness in namedentity recognition and beneficial to improve document retrieval for Malaylanguage. 2018 thesis https://ir.upsi.edu.my/detailsg.php?det=5380 https://ir.upsi.edu.my/detailsg.php?det=5380 text eng closedAccess Masters Universiti Pendidikan Sultan Idris Fakulti Seni, Komputeran dan Industri Kreatif Abdallah, S., Shaalan, K., & Shoaib, M. (2012). Integrating rule-based system with classification for arabic named entity recognition. In Lecture Notes in ComputerScience (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 311322).http://doi.org/10.1007/978-3-642-28604- 9_26AbdelRahman, S., Elarnaoty, M., & Magdy, M. (2010). Integrated Machine LearningTechniques for Arabic Named Entity Recognition. International Journal of Computer Science, 7(4),2736. Retrieved from http://ijcsi.org/papers/IJCSI-Vol-7-Issue-4-No-3.pdf#page=41Abdul-hamid, A., & Darwish, K. (2010). Simplified Feature Set for Arabic Named EntityRecognition. Proceedings of the 2010 Named Entities Workshop, (July), 110115. Retrieved fromhttp://www.aclweb.org/anthology/W10-2417Abdullah, M., & Ahmad, F. (2009). Rules frequency order stemmer for malay language. International Journal of , 9(2), 433438. Retrieved fromhttp://paper.ijcsns.org/07_book/200902/20090258.pdfAbedinpourshotorban, H., Hasan, S., Shamsuddin, S. M., & AsSahra, N. F. (2016). Adifferential-based harmony search algorithm for the optimization of continuous problems.Expert Systems with Applications, 62, 317332. http://doi.org/10.1016/j.eswa.2016.05.013Aboaoga, M., & Aziz, M. J. A. (2013). Arabic person names recognition by using arule based approach. Journal of Computer Science, 9(7), 922927. http://doi.org/10.3844/jcssp.2013.922.927Abu Bakar, J., Omar, K., Nasrudin, M. F., & Murah, M. Z. (2013). Part-of-Speech for Old MalayManuscript Corpus: A Review. In Communications in Computer and Information Science (Vol.378 CCIS, pp. 5366). http://doi.org/10.1007/978-3-642-40567-9_5Abu Bakar, J., Omar, K., Nasrudin, M. F., Murah, M. Z., Al-shoukry, S., Omar, N., Klose,A. (2013). Processing natural malay texts: A data-driven approach. Neurocomputing, 79(3),26702676. http://doi.org/10.3176/tr.2010.1.06Agarwal, S. K., Shah, S., & Kumar, R. (2015). Classification of mental tasks from EEG data usingbacktracking search optimization based neural classifier. Neurocomputing, 166, 397 403.http://doi.org/10.1016/j.neucom.2015.03.041Aggarwal, C., & Zhao, P. (2013). Towards graphical models for text processing. Knowledge andInformation Systems, 36(1), 121. http://doi.org/10.1007/s10115-012-0552-3Ahmad, Z. H., & Khalifa, O. (2008). Towards designing a high intelligibility rulebased standard Malay text-to-speech synthesis system. Proceedings of the International Conferenceon Computer and Communication Engineering 2008, ICCCE08: Global Links for HumanDevelopment, 8994. http://doi.org/10.1109/ICCCE.2008.4580574Ahmed, Z. (2013). Named Entity Recognition and Question Answering Using Word Vectors andClustering.Akbari, R., Hedayatzadeh, R., Ziarati, K., & Hassanizadeh, B. (2012). A multi-objectiveartificial bee colony algorithm. Swarm and Evolutionary Computation, 2, 3952.http://doi.org/10.1016/j.swevo.2011.08.001Alfred, R. (2016). Intelligent Information and Database Systems. In ACIIDS 2016, Part II (pp.447457). http://doi.org/10.1007/978-3-642-12145-6Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay Named Entity RecognitionBased on Rule-Based Approach. International Journal of Machine Learning and Computing,4(3), 300306. http://doi.org/10.7763/IJMLC.2014.V4.428Aljoumaa, H. (2012). Development of a Self-Learning Approach Applied to PatternRecognition and Fuzzy Control, (September 2012), 127.Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature SelectionMethods Effects on Machine Learning Approaches in Malay Sentiment Analysis, (October),25.Alshalabi, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of FeatureSelection and Machine Learning Methods in Automatic Malay Text Categorization.International Conference on Electrical Engineering and Informatics (ICEEI 2013), 11(Iceei),748754. http://doi.org/10.1016/j.protcy.2013.12.254Al-shammaa, M., & Abbod, M. F. (2015). Automatic Generation of Fuzzy ClassificationRules from Data.Al-shoukry, S., & Omar, N. (2015). Proper Nouns Recognition in Arabic Crime Text UsingMachine Learning Approach, 79(3), 506513.Althobaiti, M., Kruschwitz, U., & Poesio, M. (2015). Combining Minimally-supervisedMethods for Arabic Named Entity Recognition. Transactions of the Association forComputational Linguistics, 3, 243255. Retrieved fromhttps://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/564Althobaiti, M., Kruschwitz, U., & Poesio, M. (2013). A Semi-supervised Learning Approachto Arabic Named Entity Recognition, (September), 3240.http://doi.org/10.1177/0165551513502417Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic Creation of Arabic NamedEntity Annotated Corpus Using Wikipedia. Proceedings of the Student Research Workshop atthe 14th Conference of the European Chapter of the Association for ComputationalLinguistics, 106115. Retrieved from http://www.aclweb.org/anthology/E14-3012Ananiadou, S., & McNaught, J. (2006). Text Mining for Biology and Biomedicine. Boston:Artech House.Ananiadou, S., Pyysalo, S., Tsujii, J., & Kell, D. B. (2010). Event extraction for systemsbiology by text mining the literature. Trends in Biotechnology.http://doi.org/10.1016/j.tibtech.2010.04.005Ando, R. R. K., & Zhang, T. (2005). A high-performance semi-supervised learning methodfor text chunking. Proceedings of the 43rd Annual Meeting on Association for ComputationalLinguistics, (June), 19. http://doi.org/10.3115/1219840.1219841Baharudin, B., Lee, L. H., & Khan, K. (2010). A Review of Machine Learning Algorithms forText-Documents Classification. Journal of Advances in Information Technology, 1(1), 420.http://doi.org/10.4304/jait.1.1.4-20Bali, R.-M., Chua, C. C., & Ng, P. K. (2007). Identifying and Classifying Unknown Words InMalay Texts. The Seventh International Symposium on Natural Language Processing(SNLP2007), 493498. Retrieved fromhttp://eprints.usm.my/9442/1/Identifying_and_classifying_unknown_words_in_Malay_texts.pdf%5Cnhttp://eprints.usm.my/9442/Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). OpenInformation Extraction from the Web. Proceedings of IJCAI-07, the International JointConference on Artificial Intelligence, 26702676. http://doi.org/10.1145/1409360.1409378Bawane, M. S., & Gadicha, P. V. B. (n.d.). Analysing the result of GRIAS framework byusing Precision , Recall and F-measure, 2430.Benajiba, Y., Diab, M., & Rosso, P. (2008). Arabic named entity recognition using optimizedfeature sets. EMNLP 08 Proceedings of the Conference on Empirical Methods in NaturalLanguage Processing, (October), 284293. Retrieved fromhttp://dl.acm.org/citation.cfm?id=1613715.1613755Benajiba, Y., & Rosso, P. (2008). Arabic Named Entity Recognition using ConditionalRandom Fields. Proc. of Workshop on HLT & NLP within the Arabic World, LREC. Vol. 8.,143153. Retrieved fromhttp://www.dsic.upv.es/~prosso/resources/BenajibaRosso_LREC08.pdfBenajiba, Y., Rosso, P., & BenedRuiz, J. (2007). ANERsys: an Arabic named entityrecognition system based on maximum entropy. Gelbukh, A. (Ed.) CICLing 2007. LNCS,143153. Retrieved from http://www.springerlink.com/index/5g6n298843878701.pdfBezdek, J. C. (1993). A Physical Interpretation of Fuzzy ISODATA. Readings in Fuzzy Setsfor Intelligent Systems, (November), 615616. http://doi.org/10.1109/TSMC.1976.4309506Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M. a, Maynard, D., & Aswani, N.(2013). TwitIE : An Open-Source Information Extraction Pipeline for Microblog Text. InProceedings of Recent Advances in Natural Language Processing (pp. 8390). Retrievedfrom https://www.aclweb.org/anthology/R/R13/R13-1011.pdfBrief, T. (2005). Agreement , the F-Measure , and Reliability in Information Retrieval, 296298. http://doi.org/10.1197/jamia.M1733.InformaticsBrill, E. (2000). Pattern-based disambiguation for natural language processing. AnnualMeeting of the ACL, 1. Retrieved from http://portal.acm.org/citation.cfm?id=1117795Bsoul, Q., Salim, J., & Zakaria, L. Q. (2013). An Intelligent Document Clustering Approachto Detect Crime Patterns. Procedia Technology, 11(Iceei), 11811187.http://doi.org/10.1016/j.protcy.2013.12.311Cao, T. H., Tang, T. M., & Chau, C. K. (2012). Text Clustering with Named Entities: AModel, Experimentation and Realization. Intelligent Systems Reference Library, 23, 267287.http://doi.org/10.1007/978-3-642-23166-7_10Carlson, A., & Betteridge, J. (2010). Coupled semi-supervised learning for informationextraction. Proceedings of the Third ACM International Conference on Web Search and DataMining (2010), 101110. http://doi.org/10.1145/1718487.1718501Chapman, C. A. (2016). Usage and refactoring studies of python regular expressions by.Graduate Theses and Dissertations. This, Paper 1513.Chapman, C., & Stolee, K. T. (2016). Exploring regular expression usage and context inPython. In Proceedings of the 25th International Symposium on Software Testing andAnalysis - ISSTA 2016 (pp. 282293). http://doi.org/10.1145/2931037.2931073Chart, G., Algorithm, G., Tun, U., & Onn, H. (2012). Single Disciplinary Project Application Form Fundamental Research Grant Scheme (FRGS), (i), 116.http://doi.org/10.1155/2013/782519.(ISI-Q2).Che, W., Wang, M., Manning, C. D., & Liu, T. (2013). Named Entity Recognition withBilingual Constraints. Proceedings of the 2013 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies, (June), 5262. Retrieved from http://www.aclweb.org/anthology/N13-1006Chen, K., Dong, X., Zhu, J., & Shen, B. (2016). Building a domain knowledge basefrom wikipedia: A semi-supervised approach. Proceedings of the International Conference onSoftware Engineering and Knowledge Engineering, SEKE, 2016Janua. http://doi.org/10.18293/SEKE2016-051Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., & Vaithyanathan, S. (2010).Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks. Proceedings of the2010 Conference on Empirical Methods in Natural Language Processing, (October), 10021012.Retrieved from http://portal.acm.org/citation.cfm?id=1870756Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P.(2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research,12(Aug), 24932537. http://doi.org/10.1145/2347736.2347755Derczynski, L., Maynard, D., Rizzo, G., & Erp, M. Van. (n.d.). Analysis of Named EntityRecognition and Linking for Tweets, 135.Diab, M. (2009). Second Generation AMIRA Tools for Arabic Processing?: Fast and RobustTokenization, POS tagging, and Base Phrase Chunking. Proceedings of the SecondInternational Conference on Arabic Language Resources and Tools, 285288. Retrieved fromhttp://www.elda.org/medar-conference/pdf/56.pdfDuan, H., Zheng, Y., & Random, C. (2011). A Study on Features of the CRFs-based Chinese.International Journal of Advanced Intelligence, 3(2), 287294.Dumais, S., & Chen, H. (2000). Hierarchical classification of Web content. SIGIR 00:Proceedings of the 23rd Annual International ACM SIGIR Conference on Research andDevelopment in Information Retrieval, 256263. http://doi.org/10.1145/345508.345593Ek, T., Kirkegaard, C., Jonsson, H., & Nugues, P. (2011). Named entity recognition for short text messages. Procedia - Social and Behavioral Sciences, 27(September), 178187.http://doi.org/10.1016/j.sbspro.2011.10.596Ekbal, A., & Saha, S. (2011). A multiobjective simulated annealing approach for classifierensemble: Named entity recognition in Indian languages as case studies. Expert Systems withApplications, 38(12), 1476014772. http://doi.org/10.1016/j.eswa.2011.05.004Ekbal, A., Saha, S., & Sikdar, U. K. (2012). Multiobjective Optimization for BiomedicalNamed Entity Recognition and Classification. Procedia Technology, 6(0), 206213.http://doi.org/http://dx.doi.org/10.1016/j.protcy.2012.10.025Elsayed, H., & Elghazaly, T. (2015). A Named Entities Recognition System for ModernStandard Arabic using Rule-Based Approach. 2015 First International Conference on ArabicComputational Linguistics (ACLing), 12(1), 5154. http://doi.org/10.1109/ACLing.2015.14Elsebai, a, Meziane, F., & Belkredim, F. (2009). A Rule Based Persons Names ArabicExtraction System. Communications of the IBIMA, 11(August), 5359. Retrieved fromhttp://usir.salford.ac.uk/2206/Elyasir, A. M. H., Sonai, K., & Anbananthen, M. (2013). Comparison between Bag of Wordsand Word Sense Disambiguation, (Icacsei), 413417.Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S., Yates,A. (2005). Unsupervised named-entity extraction from the Web: An experimental study.Artificial Intelligence, 165(1), 91134. http://doi.org/10.1016/j.artint.2005.03.001Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & Dhalila, M. S. S. (2012). Simplerules malay stemmer. The International Conference on Informatics and Applications(ICIA2012), 2835. Retrieved from http://sdiwc.net/digitallibrary/download.php?id=00000187.pdfFuchs, G., Stange, H., Samiei, A., Andrienko, G., & Andrienko, N. (2015). A semi-supervisedmethod for topic extraction from micro postings. Information Technology, 57(1), 4956.http://doi.org/10.1515/itit-2014-1078Fung, P., Fung, P., Cheung, P., & Cheung, P. (2004). Mining Very-Non-Parallel Corpora:Parallel Sentence and Lexicon Extraction via Bootstrapping and EM. EMNLP 2004 -Conference on Empirical Methods in Natural Language Processing, 5763. Retrieved fromhttp://www.aclweb.org/anthology-new/W/W04/W04-3208.pdfGosselin, L., Tye-Gingras, M., & Mathieu-Potvin, F. (2009). Review of utilization of geneticalgorithms in heat transfer problems. International Journal of Heat and Mass Transfer.Elsevier Ltd. http://doi.org/10.1016/j.ijheatmasstransfer.2008.11.015Goyvaerts, J., & Levithan, S. (2012). Regular Expressions Cookbook, 612.http://doi.org/9780596802837Gunawan, Purnama, I. K. E., & Hariadi, M. (2015). Supervised learning Indonesian glossacquisition. IAENG International Journal of Computer Science, 42(4), 337346.Hassan, M., Nazlia, O., & Mohd Juzaiddin, A. A. (2015). Malay Part of Speech Tagger : AComparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology andMultimedia, 4(1), 1123. http://doi.org/10.17576/apjitm-2015-0401-02Hemmati, M., Amjady, N., & Ehsan, M. (2014). System modeling and optimization forislanded micro-grid using multi-cross learning-based chaotic differential evolution algorithm.International Journal of Electrical Power and Energy Systems, 56, 349360.http://doi.org/10.1016/j.ijepes.2013.11.015Heydt, M. (2015). Learning pandas: Get to grips with pandas - a versatile and highperformancePython library for data manipulation, analysis, and discovery. Retrieved fromhttp://gen.lib.rus.ec/book/index.php?md5=75566423DC8A5A9411165F24EF9DD886Hu, B., Tang, B., Chen, Q., & Kang, L. (2016). A novel word embedding learning modelusing the dissociation between nouns and verbs. Neurocomputing, 171, 11081117.http://doi.org/10.1016/j.neucom.2015.07.046Isa, N., Puteh, M., & Kamarudin, R. M. H. R. (2013). Sentiment classification of malaynewspaper using immune network (SCIN). Lecture Notes in Engineering and ComputerScience, 3 LNECS, 15431548. Retrieved fromhttp://www.scopus.com/inward/record.url?eid=2-s2.0-84887882006&partnerID=40&md5=652fdc713458c4dfedcbc4e3a0b736b6J.M., M. M. U. J. S.-C. S. M. J. G.-B. (2013). Named Entity Recognition: Fallacies challengesand opportunities. Computer Standards and Interfaces,3554824891(http://www.scopus.com/inward/record.url?eid=2-s2.0-84878302542&partnerID=40&md5=fa0cc4fcfad6db514533c129e08333d6).Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters,31(8), 651666. http://doi.org/10.1016/j.patrec.2009.09.011Kanagavalli, R. V, & K, R. (2013). Detecting and resolving spatial ambiguity in text usingnamed entity extraction and Self-Learning fuzzy logic techniques. Retrieved fromhttp://arxiv.org/abs/1303.0445Kantardzic, M. (2011). Data Mining: Concepts, Models, Method, and Algorithms (2ndEdition) (2nd ed.). New Jersey: John Wiley & Sons, Inc.Khalaf, Z. (2015). MAHIR System: Unsupervised Segmentation for Malay Spoken BroadcastNews Stories. International Journal of Information and Electronics Engineering, 5(3).http://doi.org/10.7763/IJIEE.2015.V5.532Kondrak, S. B. and G. (2007). Alignment-Based Discriminative String Similarity.Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,656663.Kraft, D. H., Martin-Bautista, M. J., Chen, J., & Sanchez, D. (2003). Rules and fuzzy rules intext: Concept, extraction and usage. International Journal of Approximate Reasoning, 34(23), 145161. http://doi.org/10.1016/j.ijar.2003.07.005Krl, P. (2014). Named entities as new features for Czech document classification. LectureNotes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics), 8404 LNCS (PART 2), 417427.http://doi.org/10.1007/978-3-642-54903-8_35Kummerfeld, J., & Curran, J. (2008). Classification of Verb-Particle Constructions with theGoogle Web1T Corpus. Australasian Language Technology Association Workshop 2008, 6(December), 5563. Retrieved from http://aclweb.org/anthology-new/U/U08/U08-1.pdf#page=114Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields:Probabilistic models for segmenting and labeling sequence data. ICML 01 Proceedings of theEighteenth International Conference on Machine Learning, 8(June), 282289.http://doi.org/10.1038/nprot.2006.61Larasati, S. (2012). Towards an Indonesian-English {SMT} System: A Case Study of anUnder-Studied and Under-Resourced Language, Indonesian. {WDS}12 Proceedings ofContributed Papers, 123129.Le Nguyen, M., & Shimazu, A. (2014). A semi supervised learning model for mappingsentences to logical forms with ambiguous supervision. In Data and Knowledge Engineering(Vol. 90, pp. 112). Elsevier B.V. http://doi.org/10.1016/j.datak.2013.12.001Le, T., Nguyen, K., Nguyen, V., Nguyen, V., & Phung, D. (2016). Scalable Support VectorMachine for Semi-supervised Learning, 118. Retrieved from http://arxiv.org/abs/1606.06793Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Arbor, A., & Jagadish, H. V.(2008). Regular Expression Learning for Information Extraction. Conference on EmpiricalMethods in Natural Language Processing, (October), 2130. Retrieved fromhttp://portal.acm.org/citation.cfm?id=1613719Liao, W., & Veeramachaneni, S. (2009). A simple semi-supervised algorithm for namedentity recognition. Workshop on Semi-Supervised Learning for Natural Language Processing,(June), 5865. http://doi.org/10.3115/1621829.1621837Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing Named Entities in Tweets. InProceedings of the 48th Annual Meeting of the Association for Computational Linguistics(ACL), 1(2008), 359367. Retrieved from http://acl.eldoc.ub.rug.nl/mirror/P/P11/P11-1037.pdfLu, Y., Ji, D., Yao, X., Wei, X., & Liang, X. (2015). CHEMDNER system with mixedconditional random fields and multi-scale word clustering. Journal of Cheminformatics,7(Suppl 1), S4. http://doi.org/10.1186/1758-2946-7-S1-S4Luis Eduardo, P., Iacobelli, F., & Su, S. (2015). Semi-Supervised Approach to Named EntityRecognition in Spanish Applied to a Real-World Conversational System, 224235.http://doi.org/10.1007/978-3-319-19264-2Luo, W., & Yang, F. (2016). An Empirical Study of Automatic Chinese Word Segmentationfor Spoken Language Understanding and Named Entity Recognition, 238248.Malanyon, D. (2009). Malay Lexical Analysis through Corpus-Based Approach.Eprints.Usm.My. Retrieved from http://eprints.usm.my/10608/Mangasi, T., Erwin, A., & Ipung, H. P. (2014). Defined entity extraction based on Indonesiantext document. In Proceedings - 2014 International Conference on ICT for Smart Society:Smart System Platform Development for City and Society, GoeSmart 2014, ICISS 2014 (pp.6165). http://doi.org/10.1109/ICTSS.2014.7013152Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. Online, 1,1. http://doi.org/10.1109/LPT.2009.2020494Markov, Z., & Larose, D. T. (2007). Data Mining the Web: Uncovering Patterns in WebContent, Structure, and Usage. John Wiley & Sons, Inc.Mikolov, T., Le, Q. V, & Sutskever, I. (2013). Exploiting Similarities among Languages forMachine Translation. arXiv Preprint arXiv:1309.4168v1, 110. Retrieved fromhttp://arxiv.org/abs/1309.4168v1%5Cnhttp://arxiv.org/abs/1309.4168Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., & Delen, D. (2012). Practical Text Miningand Statistical Analysis for Non-structured Text Data Applications, 1st ed. Elsevier.Oklahoma: Academic Press. http://doi.org/10.1016/B978-0-12-386979-1.00009-8Mohamed, H., Omar, N., & Ab. Aziz, M. J. (2015). Malay Part of Speech Tagger: AComparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology andMultimedia, 4(1), 1123. http://doi.org/10.17576/apjitm-2015-0401-02Mohd Don, Z. (2010). Processing natural malay texts: A data-driven approach. Trames, 14(1),90103. http://doi.org/10.3176/tr.2010.1.06Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., & Smith, N. a. (2012). Recall-orientedlearning of named entities in Arabic Wikipedia. Proceedings of the 13th Conference of theEuropean Chapter of the Association for Computational Linguistics, 162173. Retrievedfrom http://dl.acm.org/citation.cfm?id=2380816.2380839Nadeau, D. (2007). A survey of named entity recognition and classification. LinguisticaeInvestigationes, 8(30), 326. http://doi.org/10.1075/li.30.1.03nadNogueira, T. M., Rezende, S. O., & Camargo, H. a. (2010). On the use of fuzzy rules to textdocument classification. Hybrid Intelligent Systems (HIS), 2010 10th InternationalConference on, 1924. http://doi.org/10.1109/HIS.2010.5600076Noh, N., Rusydi, M., Talib, A., Ahmad, A., Halim, S. A., & Mohamed, A. (2009). MalayLanguage Document Identification Using BPNN. In Proceedings of the 10th WSEASinternational conference on Neural networks (pp. 163168).Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learningmultilingual named entity recognition from Wikipedia. Sydney: Elsevier Science.http://doi.org/10.1016/j.artint.2012.03.006Ojo, A., & Adeyemo, A. B. (2012). Framework for Knowledge Discovery from JournalArticles Using Text Mining Techniques. African Journal of Computing & ICT, 5(2), 3544.Retrieved from http://www.ajocict.net/uploads/Pre-print_-_O__Ojo___A_B__Adeyemo__2012___Framework_for_Knowledge_Discovery_from_Journal_Articles_Using_Text_Mining_Techniques.pdfOudah, M., & Shaalan, K. (2012). A Pipeline Arabic Named Entity Recognition using aHybrid Approach. COLING (December 2012), 21592176. Retrieved fromhttp://www.newdesign.aclweb.org/anthology/C/C12/C12-1132.pdfOudah, M., & Shaalan, K. (2016). Studying the impact of language-independent andlanguage-specific features on hybrid Arabic Person name recognition. Language Resourcesand Evaluation, 128. http://doi.org/10.1007/s10579-016-9376-1Petrov, S., Das, D., & McDonald, R. (2011). A Universal Part-of-Speech Tagset. Retrievedfrom http://arxiv.org/abs/1104.2086Pham, Q. H., Nguyen, M.-L., Nguyen, B. T., & Cuong, N. V. (2015). Semi-supervisedLearning for Vietnamese Named Entity Recognition using Online Conditional Random Fields.In Proceedings of the Fifth Named Entity Workshop (pp. 5055). Retrieved fromhttp://www.aclweb.org/anthology/W15-3907POWERS, D.M.W. (AILab, School of Computer Science, Engineering and Mathematics,Flinders University, South Australia, A. (2011). Evaluation: From Precision, Recall and FMeasureTo Roc, Informedness, Markedness & Correlation. Journal of Machine LearningTechnologies, 2(1), 3763. http://doi.org/10.1.1.214.9232Powers, D. M. W. (2015). What the F-measure doesnt measure: Features, Flaws, Fallaciesand Fixes, 19. http://doi.org/KIT-14-001Prasad, G., Fousiya, K. K., Kumar, M. A., & Soman, K. P. (2015). Named Entity Recognitionfor Malayalam Language : A CRF based Approach, (May), 1619.Ramli, I., Jamil, N., Seman, N., & Ardi, N. (2015). An Improved Syllabification for a BetterMalay Language Text-to-Speech Synthesis (TTS). 2015 IEEE International Symposium OnRobotics and Intelligent Sensors, 76 (Iris), 417424.http://doi.org/10.1016/j.procs.2015.12.280Rao, R. V., & Saroj, A. (2017). A self-adaptive multi-population based Jaya algorithm forengineering optimization. Swarm and Evolutionary Computation, (October 2016), 126.http://doi.org/10.1016/j.swevo.2017.04.008Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named Entity Recognition in Tweets: AnExperimental Study. Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, 15241534. Retrieved from http://dl.acm.org/citation.cfm?id=2145595Rosso, P., Benajiba, Y., & Lyhyaoui, A. (2006, December). Towards an Arabic questionanswering system. In Proc. 4th Conf. on Scientific Research Outlook & TechnologyDevelopment in the Arab world, SROIV, Damascus, Syria (pp. 11-14).Rozenfeld, B., & Feldman, R. (2008). Self-supervised relation extraction from the Web.Knowledge and Information Systems, 17(1), 1733. http://doi.org/10.1007/s10115-007-0110-6Sam, R. C., Le, H. T., Nguyen, T. T., & Nguyen, T. H. (2011). Combining proper namecoreferencewith conditional random fields for semi-supervised named entity recognition inVietnamese text. Lecture Notes in Computer Science (Including Subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformatics), 6634 LNAI (PART 1), 512524.http://doi.org/10.1007/978-3-642-20841-6-42Samat, N. A., Murad, M. A. A., Abdullah, M. T., & Atan, R. (2005). Malay DocumentsClustering Algorithm Based on Singular Value Decomposition. Journal of Theoretical andApplied Information Technology, 180186.Sari, Y., Hassan, M. F., & Zamin, N. (2009). A Hybrid Approach to Semi-supervised NamedEntity Recognition in Health, Safety and Environment Reports. 2009 InternationalConference on Future Computer and Communication, 599602.http://doi.org/10.1109/ICFCC.2009.52Sari, Y., Hassan, M. F., & Zamin, N. (2010). Rule-based pattern extractor and Named EntityRecognition: A hybrid approach. In Proceedings 2010 International Symposium onInformation Technology - Engineering Technology, ITSim10 (Vol. 2, pp. 563568).http://doi.org/10.1109/ITSIM.2010.5561392Satoshi Sekine, K. S., & Nobata, C. (2002). Extended named entity hierarchy. ThirdInternational Conference on Language Resources and Evaluation (LREC 2002), 18181824.Sazali, S. S., Rahman, N. A., & Bakar, Z. A. (2017). Information extraction: Evaluatingnamed entity recognition from classical Malay documents. In 2016 3rd InternationalConference on Information Retrieval and Knowledge Management, CAMP 2016 - ConferenceProceedings (pp. 4853). http://doi.org/10.1109/INFRKM.2016.7806333Seeger, M., & King, I. (2002). Learning from labeled and unlabeled data. Learning, (January),162. http://doi.org/10.1109/IJCNN.2002.1007592Sekine, S., Sudo, K., & Nobata, C. (2002, May). Extended Named Entity Hierarchy. In LREC.Selvaperumal, P., & Suruliandi, A. (2016). Semi-Supervised Personal Name DisambiguationTechnique for the Web. International Journal of Modern Education and Computer Science,8(3), 2836. http://doi.org/10.5815/ijmecs.2016.03.04Servan, C., Berard, A., Elloumi, Z., Blanchon, H., & Besacier, L. (2016). Word2Vec vsDBnary: Augmenting METEOR using Vector Representations or Lexical Resources?Retrieved from http://arxiv.org/abs/1610.01291Shaalan, K., & Oudah, M. (2013). A hybrid approach to Arabic named entity recognition.Journal of Information Science, 40(1), 6787. http://doi.org/10.1177/0165551513502417Shaalan, K., & Raza, H. (2007). Person Name Entity Recognition for Arabic. ComputationalLinguistics, (June), 1724. http://doi.org/10.3115/1654576.1654581Shabat, H. (2015). Named Entity Recognition in Crime News Documents Using ClassifiersCombination, 23(6), 12151222. http://doi.org/10.5829/idosi.mejsr.2015.23.06.22271Sharma, D., Devale, P. R., & Khare, A. K. (2011). Approach for Multiword ExpressionIdentification in Natural Language Processing, 2 (August 2011), 663666.Sidi. (2011). Malay Interrogative Knowledge Corpus. American Journal of Economics andBusiness Administration, 3, 171176. http://doi.org/10.3844/ajebasp.2011.171.176Sinoara, R. A., Sundermann, C. V., Marcacini, R. M., Domingues, M. A., & Rezende, S. O.(2014). Named entities as privileged information for hierarchical text clustering. Proceedingsof the 18th International Database Engineering & Applications Symposium on - IDEAS 14,5766. http://doi.org/10.1145/2628194.2628225Srivastava, A. N., & Sahami, M. (2009). Text Mining: Classification, Clustering, andApplications. Boca Raton: Chapman and Hall/CRC.Suakkaphong, N., Zhang, Z., & Chen, H. (2013). Disease Named Entity Recognition UsingSemisupervised Learning and Conditional Random Fields. Journal of the American Societyfor Information Science and Technology, 14(4), 90103. http://doi.org/10.1002/asiSun, a, Grishman, R., & Sekine, S. (2011). Semi-supervised relation extraction with largescaleword clustering. Proceedings of the 49th Annual Meeting , 521529. Retrieved fromhttp://www.aaai.org/Papers/AAAI/2007/AAAI07-224.pdf%5Cnhttp://dl.acm.org/citation.cfm?id=2002539Suwarningsih, W., Supriana, I., & Purwarianti, A. (2015). ImNER Indonesian medical namedentity recognition. In Proceedings of 2014 2nd International Conference on Technology,Informatics, Management, Engineering and Environment, TIME-E 2014 (pp. 184188).http://doi.org/10.1109/TIME-E.2014.7011615Tabuchi, N., Sumii, E., & Yonezawa, A. (2003). Regular expression types for strings in a textprocessing language. Electronic Notes in Theoretical Computer Science, 75, 97115.http://doi.org/10.1016/S1571-0661 (04)80781-3Tan, T. P., Xiao, X., Tang, E. K., Chng, E. S., & Li, H. (2009). MASS: A Malay languageLVCSR corpus resource. 2009 Oriental COCOSDA International Conference on SpeechDatabase and Assessments, ICSDA 2009, 2530.http://doi.org/10.1109/ICSDA.2009.5278382Tran, V. C., Hwang, D., & Jung, J. J. (2015). Semi-supervised Approach Based on CooccurrenceCoefficient for Named Entity Recognition on Twitter, 141146.Triguero, I., Garca, S., & Herrera, F. (2013). Self-labeled techniques for semi-supervisedlearning: taxonomy, software and empirical study. Knowledge and Information Systems, pp.140. http://doi.org/10.1007/s10115-013-0706-yTriguero, I., Sez, J. A., Luengo, J., Garca, S., & Herrera, F. (2014). On the characterizationof noise filters for self-training semi-supervised in nearest neighbor classification.Neurocomputing, 132, 3041. http://doi.org/10.1016/j.neucom.2013.05.055Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based framework for textcategorization. In Procedia Engineering (Vol. 69, pp. 13561364). Elsevier B.V.http://doi.org/10.1016/j.proeng.2014.03.129Tuffery, S. (2011). Data Mining and Statistics for Decision Making. Wiley.Turian, J., Ratinov, L., Bengio, Y., & Turian, J. (2010). Word Representations: A Simple andGeneral Method for Semi-supervised Learning. Proceedings of the 48th Annual Meeting ofthe Association for Computational Linguistics, (July), 384394.http://doi.org/10.1.1.301.5840Wibawa, A. S., & Purwarianti, A. (2016). Indonesian Named-entity Recognition for 15Classes Using Ensemble Supervised Learning. Procedia Computer Science, 81(May), 221228. http://doi.org/10.1016/j.procs.2016.04.053Witten, I. H., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Toolsand Techniques (2nd ed.). http://doi.org/citeulike-article-id:8827086Worden, K., Staszewski, W. J., & Hensman, J. J. (2011). Natural computing for mechanicalsystems research: A tutorial overview. Mechanical Systems and Signal Processing. Elsevier.http://doi.org/10.1016/j.ymssp.2010.07.013Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H.,Steinberg, D. (2008). Top10 algorithms in data mining. Knowledge and Information Systems (Vol. 14).http://doi.org/10.1007/s10115-007-0114-2Xian, B. C. M., Lubani, M., Ping, L. K., Bouzekri, K., Mahmud, R., & Lukose, D. (2016).Benchmarking Mi-POS: Malay Part-of-Speech Tagger. International Journal of KnowledgeEngineering, 2(3), 115121. http://doi.org/10.18178/ijke.2016.2.3.064Yang, F., & Vozila, P. (2014). Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields. Emnlp, 9098. Retrieved fromhttp://emnlp2014.org/papers/pdf/EMNLP2014010.pdfYesilbudak, M., Sagiroglu, S., & Colak, I. (2017). A novel implementation of kNN classifierbased on multi-tupled meteorological input data for wind power prediction. EnergyConversion and Management, 135, 434444. http://doi.org/10.1016/j.enconman.2016.12.094Yong, S.-F., Ranaivo-Malan?on, B., & Wee, A. Y. (2011). NERSIL : the named-entityrecognition system for Iban language. 25th Pacific Asia Conference on Language,Information and Computation, 549558.Yong, Z., Youwen, L., & Shixiong, X. (2009). An Improved KNN Text ClassificationAlgorithm Based on Clustering. Journal of Computers, 4(3), 230237.http://doi.org/10.4304/jcp.4.3.230-237Zamin, N., & Oxley, A. (2011). Building a Corpus-Derived Gazetteer for Named EntityRecognition, 7380.Zamin, N., Oxley, A., Abu Bakar, Z., & Farhan, S. A. (2012). A statistical dictionary-basedword alignment algorithm: An unsupervised approach. In 2012 International Conference onComputer and Information Science, ICCIS 2012 - A Conference of World Engineering,Science and Technology Congress, ESTCON 2012 - Conference Proceedings (Vol. 1, pp.396402). http://doi.org/10.1109/ICCISci.2012.6297278Zatarain Salazar, J., Reed, P. M., Herman, J. D., Giuliani, M., & Castelletti, A. (2016). Adiagnostic assessment of evolutionary algorithms for multi-objective surface water reservoircontrol. Advances in Water Resources, 92, 172185.http://doi.org/10.1016/j.advwatres.2016.04.006Zeng, H., Song, A., & Cheung, Y. M. (2013). Improving clustering with pairwise constraints:A discriminative approach. Knowledge and Information Systems, 36(2), 489515.http://doi.org/10.1007/s10115-012-0592-8Zhan, Q. (2017). An Improved K-means Algorithm Based on Structure Features, 12(1), 6280.http://doi.org/10.17706/jsw.12.1.62-81Zhang, C., Hong, X., & Peng, Z. (2012). An automatic approach to harvesting temporalknowledge of entity relationships. In Procedia Engineering (Vol. 29, pp. 13991409).http://doi.org/10.1016/j.proeng.2012.01.147Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition:Experiments with clinical and biological texts. Journal of Biomedical Informatics, 46(6),10881098. http://doi.org/10.1016/j.jbi.2013.08.004Zhou, D., & Zhong, D. (2015). A semi-supervised learning framework for biomedical eventextraction based on hidden topics. Artificial Intelligence in Medicine, 64(1), 5158.http://doi.org/10.1016/j.artmed.2015.03.004Zirikly, A., & Diab, M. (2015). Named Entity Recognition for Arabic Social Media.Proceedings of NAACL-HLT 2015, 176185. Retrieved fromhttp://www.aclweb.org/anthology/W15-1524.pdf |