UPSI Digital Repository (UDRep)
|
|
|
Abstract : Universiti Pendidikan Sultan Idris |
This study was aimed to develop a Malay proper noun detection method to cluster and
classify named entity categories, particularly for major important classes such as
person, location, organization, and miscellaneous for Malay newspaper corpus. Regular
Expression pattern identification (regex) algorithm and rule were introduced in this study to
overcome the limitation of dictionary and gazetteer. Two visualization techniques namely as
Decision Tree and Term Document Matrix had been used to evaluate the efficiency of the
method. The result obtained 74% of accuracy during the generation of decision tree.
Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and
9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, the
regex algorithm could indicate the presence of Malay proper noun, thus making it an appropriate
method for extraction tool to cluster and classify Malay proper noun. The study implicates that
the use of Malay proper noun detection method can increase the effectiveness in named
entity recognition and beneficial to improve document retrieval for Malay
language.
|
References |
Abdallah, S., Shaalan, K., & Shoaib, M. (2012). Integrating rule-based system with classification for arabic named entity recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 311–322). http://doi.org/10.1007/978-3-642-28604- 9_26
AbdelRahman, S., Elarnaoty, M., & Magdy, M. (2010). Integrated Machine Learning Techniques for Arabic Named Entity Recognition. International Journal of Computer Science, 7(4), 27–36. Retrieved from http://ijcsi.org/papers/IJCSI-Vol-7-Issue-4-No-3.pdf#page=41
Abdul-hamid, A., & Darwish, K. (2010). Simplified Feature Set for Arabic Named Entity Recognition. Proceedings of the 2010 Named Entities Workshop, (July), 110–115. Retrieved from http://www.aclweb.org/anthology/W10-2417
Abdullah, M., & Ahmad, F. (2009). Rules frequency order stemmer for malay language. … International Journal of …, 9(2), 433–438. Retrieved from http://paper.ijcsns.org/07_book/200902/20090258.pdf
Abedinpourshotorban, H., Hasan, S., Shamsuddin, S. M., & As’Sahra, N. F. (2016). A differential-based harmony search algorithm for the optimization of continuous problems. Expert Systems with Applications, 62, 317–332. http://doi.org/10.1016/j.eswa.2016.05.013
Aboaoga, M., & Aziz, M. J. A. (2013). Arabic person names recognition by using a rule based approach. Journal of Computer Science, 9(7), 922–927. http://doi.org/10.3844/jcssp.2013.922.927
Abu Bakar, J., Omar, K., Nasrudin, M. F., & Murah, M. Z. (2013). Part-of-Speech for Old Malay Manuscript Corpus: A Review. In Communications in Computer and Information Science (Vol. 378 CCIS, pp. 53–66). http://doi.org/10.1007/978-3-642-40567-9_5
Abu Bakar, J., Omar, K., Nasrudin, M. F., Murah, M. Z., Al-shoukry, S., Omar, N., … Klose, A. (2013). Processing natural malay texts: A data-driven approach. Neurocomputing, 79(3), 2670–2676. http://doi.org/10.3176/tr.2010.1.06
Agarwal, S. K., Shah, S., & Kumar, R. (2015). Classification of mental tasks from EEG data using backtracking search optimization based neural classifier. Neurocomputing, 166, 397– 403. http://doi.org/10.1016/j.neucom.2015.03.041
Aggarwal, C., & Zhao, P. (2013). Towards graphical models for text processing. Knowledge and Information Systems, 36(1), 1–21. http://doi.org/10.1007/s10115-012-0552-3
Ahmad, Z. H., & Khalifa, O. (2008). Towards designing a high intelligibility rule based standard Malay text-to-speech synthesis system. Proceedings of the International Conference on Computer and Communication Engineering 2008, ICCCE08: Global Links for Human Development, 89–94. http://doi.org/10.1109/ICCCE.2008.4580574
Ahmed, Z. (2013). Named Entity Recognition and Question Answering Using Word Vectors and Clustering.
Akbari, R., Hedayatzadeh, R., Ziarati, K., & Hassanizadeh, B. (2012). A multi-objective artificial bee colony algorithm. Swarm and Evolutionary Computation, 2, 39–52. http://doi.org/10.1016/j.swevo.2011.08.001
Alfred, R. (2016). Intelligent Information and Database Systems. In ACIIDS 2016, Part II (pp. 447–457). http://doi.org/10.1007/978-3-642-12145-6
Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay Named Entity Recognition Based on Rule-Based Approach. International Journal of Machine Learning and Computing, 4(3), 300–306. http://doi.org/10.7763/IJMLC.2014.V4.428
Aljoumaa, H. (2012). Development of a Self-Learning Approach Applied to Pattern Recognition and Fuzzy Control, (September 2012), 127.
Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature Selection Methods Effects on Machine Learning Approaches in Malay Sentiment Analysis, (October), 2–5.
Alshalabi, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization. International Conference on Electrical Engineering and Informatics (ICEEI 2013), 11(Iceei), 748–754. http://doi.org/10.1016/j.protcy.2013.12.254
Al-shammaa, M., & Abbod, M. F. (2015). Automatic Generation of Fuzzy Classification Rules from Data.
Al-shoukry, S., & Omar, N. (2015). Proper Nouns Recognition in Arabic Crime Text Using Machine Learning Approach, 79(3), 506–513.
Althobaiti, M., Kruschwitz, U., & Poesio, M. (2015). Combining Minimally-supervised Methods for Arabic Named Entity Recognition. Transactions of the Association for Computational Linguistics, 3, 243–255. Retrieved from https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/564
Althobaiti, M., Kruschwitz, U., & Poesio, M. (2013). A Semi-supervised Learning Approach to Arabic Named Entity Recognition, (September), 32–40. http://doi.org/10.1177/0165551513502417
Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia. Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 106–115. Retrieved from http://www.aclweb.org/anthology/E14-3012
Ananiadou, S., & McNaught, J. (2006). Text Mining for Biology and Biomedicine. Boston: Artech House.
Ananiadou, S., Pyysalo, S., Tsujii, J., & Kell, D. B. (2010). Event extraction for systems biology by text mining the literature. Trends in Biotechnology. http://doi.org/10.1016/j.tibtech.2010.04.005
Ando, R. R. K., & Zhang, T. (2005). A high-performance semi-supervised learning method for text chunking. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, (June), 1–9. http://doi.org/10.3115/1219840.1219841
Baharudin, B., Lee, L. H., & Khan, K. (2010). A Review of Machine Learning Algorithms for Text-Documents Classification. Journal of Advances in Information Technology, 1(1), 4–20. http://doi.org/10.4304/jait.1.1.4-20
Bali, R.-M., Chua, C. C., & Ng, P. K. (2007). Identifying and Classifying Unknown Words In Malay Texts. The Seventh International Symposium on Natural Language Processing
(SNLP2007), 493–498. Retrieved from http://eprints.usm.my/9442/1/Identifying_and_classifying_unknown_words_in_Malay_texts.p df%5Cnhttp://eprints.usm.my/9442/
Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open Information Extraction from the Web. Proceedings of IJCAI-07, the International Joint Conference on Artificial Intelligence, 2670–2676. http://doi.org/10.1145/1409360.1409378
Bawane, M. S., & Gadicha, P. V. B. (n.d.). Analysing the result of GRIAS framework by using Precision , Recall and F-measure, 24–30.
Benajiba, Y., Diab, M., & Rosso, P. (2008). Arabic named entity recognition using optimized feature sets. EMNLP ’08 Proceedings of the Conference on Empirical Methods in Natural Language Processing, (October), 284–293. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613755
Benajiba, Y., & Rosso, P. (2008). Arabic Named Entity Recognition using Conditional Random Fields. Proc. of Workshop on HLT & NLP within the Arabic World, LREC. Vol. 8., 143–153. Retrieved from http://www.dsic.upv.es/~prosso/resources/BenajibaRosso_LREC08.pdf
Benajiba, Y., Rosso, P., & BenedíRuiz, J. (2007). ANERsys: an Arabic named entity recognition system based on maximum entropy. Gelbukh, A. (Ed.) CICLing 2007. LNCS, 143–153. Retrieved from http://www.springerlink.com/index/5g6n298843878701.pdf
Bezdek, J. C. (1993). A Physical Interpretation of Fuzzy ISODATA. Readings in Fuzzy Sets for Intelligent Systems, (November), 615–616. http://doi.org/10.1109/TSMC.1976.4309506
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M. a, Maynard, D., & Aswani, N. (2013). TwitIE : An Open-Source Information Extraction Pipeline for Microblog Text. In Proceedings of Recent Advances in Natural Language Processing (pp. 83–90). Retrieved from https://www.aclweb.org/anthology/R/R13/R13-1011.pdf
Brief, T. (2005). Agreement , the F-Measure , and Reliability in Information Retrieval, 296– 298. http://doi.org/10.1197/jamia.M1733.Informatics
Brill, E. (2000). Pattern-based disambiguation for natural language processing. Annual Meeting of the ACL, 1. Retrieved from http://portal.acm.org/citation.cfm?id=1117795
Bsoul, Q., Salim, J., & Zakaria, L. Q. (2013). An Intelligent Document Clustering Approach to Detect Crime Patterns. Procedia Technology, 11(Iceei), 1181–1187. http://doi.org/10.1016/j.protcy.2013.12.311
Cao, T. H., Tang, T. M., & Chau, C. K. (2012). Text Clustering with Named Entities: A Model, Experimentation and Realization. Intelligent Systems Reference Library, 23, 267–287. http://doi.org/10.1007/978-3-642-23166-7_10
Carlson, A., & Betteridge, J. (2010). Coupled semi-supervised learning for information extraction. Proceedings of the Third ACM International Conference on Web Search and Data Mining (2010), 101–110. http://doi.org/10.1145/1718487.1718501
Chapman, C. A. (2016). Usage and refactoring studies of python regular expressions by. Graduate Theses and Dissertations. This, Paper 1513.
Chapman, C., & Stolee, K. T. (2016). Exploring regular expression usage and context in Python. In Proceedings of the 25th International Symposium on Software Testing and Analysis - ISSTA 2016 (pp. 282–293). http://doi.org/10.1145/2931037.2931073
Chart, G., Algorithm, G., Tun, U., & Onn, H. (2012). Single Disciplinary Project Application Form Fundamental Research Grant Scheme (FRGS), (i), 1–16. http://doi.org/10.1155/2013/782519.(ISI-Q2).
Che, W., Wang, M., Manning, C. D., & Liu, T. (2013). Named Entity Recognition with Bilingual Constraints. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (June), 52– 62. Retrieved from http://www.aclweb.org/anthology/N13-1006
Chen, K., Dong, X., Zhu, J., & Shen, B. (2016). Building a domain knowledge base from wikipedia: A semi-supervised approach. Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE, 2016–Janua. http://doi.org/10.18293/SEKE2016-051
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., & Vaithyanathan, S. (2010). Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, (October), 1002–1012. Retrieved from http://portal.acm.org/citation.cfm?id=1870756
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 12(Aug), 2493–2537. http://doi.org/10.1145/2347736.2347755
Derczynski, L., Maynard, D., Rizzo, G., & Erp, M. Van. (n.d.). Analysis of Named Entity Recognition and Linking for Tweets, 1–35.
Diab, M. (2009). Second Generation AMIRA Tools for Arabic Processing?: Fast and Robust Tokenization, POS tagging, and Base Phrase Chunking. Proceedings of the Second International Conference on Arabic Language Resources and Tools, 285–288. Retrieved from http://www.elda.org/medar-conference/pdf/56.pdf
Duan, H., Zheng, Y., & Random, C. (2011). A Study on Features of the CRFs-based Chinese. International Journal of Advanced Intelligence, 3(2), 287–294.
Dumais, S., & Chen, H. (2000). Hierarchical classification of Web content. SIGIR ’00: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 256–263. http://doi.org/10.1145/345508.345593
Ek, T., Kirkegaard, C., Jonsson, H., & Nugues, P. (2011). Named entity recognition for short text messages. Procedia - Social and Behavioral Sciences, 27(September), 178–187. http://doi.org/10.1016/j.sbspro.2011.10.596
Ekbal, A., & Saha, S. (2011). A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies. Expert Systems with Applications, 38(12), 14760–14772. http://doi.org/10.1016/j.eswa.2011.05.004
Ekbal, A., Saha, S., & Sikdar, U. K. (2012). Multiobjective Optimization for Biomedical Named Entity Recognition and Classification. Procedia Technology, 6(0), 206–213. http://doi.org/http://dx.doi.org/10.1016/j.protcy.2012.10.025
Elsayed, H., & Elghazaly, T. (2015). A Named Entities Recognition System for Modern Standard Arabic using Rule-Based Approach. 2015 First International Conference on Arabic Computational Linguistics (ACLing), 12(1), 51–54. http://doi.org/10.1109/ACLing.2015.14
Elsebai, a, Meziane, F., & Belkredim, F. (2009). A Rule Based Persons Names Arabic Extraction System. Communications of the IBIMA, 11(August), 53–59. Retrieved from http://usir.salford.ac.uk/2206/
Elyasir, A. M. H., Sonai, K., & Anbananthen, M. (2013). Comparison between Bag of Words and Word Sense Disambiguation, (Icacsei), 413–417.
Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S.,… Yates, A. (2005). Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165(1), 91–134. http://doi.org/10.1016/j.artint.2005.03.001
Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & Dhalila, M. S. S. (2012). Simple rules malay stemmer. The International Conference on Informatics and Applications (ICIA2012), 28–35. Retrieved from http://sdiwc.net/digitallibrary/ download.php?id=00000187.pdf
Fuchs, G., Stange, H., Samiei, A., Andrienko, G., & Andrienko, N. (2015). A semi-supervised method for topic extraction from micro postings. Information Technology, 57(1), 49–56. http://doi.org/10.1515/itit-2014-1078
Fung, P., Fung, P., Cheung, P., & Cheung, P. (2004). Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM. EMNLP 2004 - Conference on Empirical Methods in Natural Language Processing, 57–63. Retrieved from http://www.aclweb.org/anthology-new/W/W04/W04-3208.pdf
Gosselin, L., Tye-Gingras, M., & Mathieu-Potvin, F. (2009). Review of utilization of genetic algorithms in heat transfer problems. International Journal of Heat and Mass Transfer. Elsevier Ltd. http://doi.org/10.1016/j.ijheatmasstransfer.2008.11.015
Goyvaerts, J., & Levithan, S. (2012). Regular Expressions Cookbook, 612. http://doi.org/9780596802837
Gunawan, Purnama, I. K. E., & Hariadi, M. (2015). Supervised learning Indonesian gloss acquisition. IAENG International Journal of Computer Science, 42(4), 337–346.
Hassan, M., Nazlia, O., & Mohd Juzaiddin, A. A. (2015). Malay Part of Speech Tagger : A Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and Multimedia, 4(1), 11–23. http://doi.org/10.17576/apjitm-2015-0401-02
Hemmati, M., Amjady, N., & Ehsan, M. (2014). System modeling and optimization for islanded micro-grid using multi-cross learning-based chaotic differential evolution algorithm. International Journal of Electrical Power and Energy Systems, 56, 349–360. http://doi.org/10.1016/j.ijepes.2013.11.015
Heydt, M. (2015). Learning pandas: Get to grips with pandas - a versatile and highperformance Python library for data manipulation, analysis, and discovery. Retrieved from http://gen.lib.rus.ec/book/index.php?md5=75566423DC8A5A9411165F24EF9DD886
Hu, B., Tang, B., Chen, Q., & Kang, L. (2016). A novel word embedding learning model using the dissociation between nouns and verbs. Neurocomputing, 171, 1108–1117. http://doi.org/10.1016/j.neucom.2015.07.046
Isa, N., Puteh, M., & Kamarudin, R. M. H. R. (2013). Sentiment classification of malay newspaper using immune network (SCIN). Lecture Notes in Engineering and Computer Science, 3 LNECS, 1543–1548. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0- 84887882006&partnerID=40&md5=652fdc713458c4dfedcbc4e3a0b736b6
J.M., M. M. U. J. S.-C. S. M. J. G.-B. (2013). Named Entity Recognition: Fallacies challenges and opportunities. Computer Standards and Interfaces, 3554824891(http://www.scopus.com/inward/record.url?eid=2-s2.0- 84878302542&partnerID=40&md5=fa0cc4fcfad6db514533c129e08333d6).
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. http://doi.org/10.1016/j.patrec.2009.09.011
Kanagavalli, R. V, & K, R. (2013). Detecting and resolving spatial ambiguity in text using named entity extraction and Self-Learning fuzzy logic techniques. Retrieved from http://arxiv.org/abs/1303.0445
Kantardzic, M. (2011). Data Mining: Concepts, Models, Method, and Algorithms (2nd Edition) (2nd ed.). New Jersey: John Wiley & Sons, Inc.
Khalaf, Z. (2015). MAHIR System: Unsupervised Segmentation for Malay Spoken Broadcast News Stories. International Journal of Information and Electronics Engineering, 5(3). http://doi.org/10.7763/IJIEE.2015.V5.532
Kondrak, S. B. and G. (2007). Alignment-Based Discriminative String Similarity. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 656–663.
Kraft, D. H., Martin-Bautista, M. J., Chen, J., & Sanchez, D. (2003). Rules and fuzzy rules in text: Concept, extraction and usage. International Journal of Approximate Reasoning, 34(2– 3), 145–161. http://doi.org/10.1016/j.ijar.2003.07.005
Král, P. (2014). Named entities as new features for Czech document classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8404 LNCS (PART 2), 417–427. http://doi.org/10.1007/978-3-642-54903-8_35
Kummerfeld, J., & Curran, J. (2008). Classification of Verb-Particle Constructions with the Google Web1T Corpus. Australasian Language Technology Association Workshop 2008, 6 (December), 55–63. Retrieved from http://aclweb.org/anthology-new/U/U08/U08- 1.pdf#page=114
Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML ’01 Proceedings of the Eighteenth International Conference on Machine Learning, 8(June), 282–289. http://doi.org/10.1038/nprot.2006.61
Larasati, S. (2012). Towards an Indonesian-English {SMT} System: A Case Study of an Under-Studied and Under-Resourced Language, Indonesian. {WDS}’12 Proceedings of Contributed Papers, 123–129.
Le Nguyen, M., & Shimazu, A. (2014). A semi supervised learning model for mapping sentences to logical forms with ambiguous supervision. In Data and Knowledge Engineering (Vol. 90, pp. 1–12). Elsevier B.V. http://doi.org/10.1016/j.datak.2013.12.001
Le, T., Nguyen, K., Nguyen, V., Nguyen, V., & Phung, D. (2016). Scalable Support Vector Machine for Semi-supervised Learning, 1–18. Retrieved from http://arxiv.org/abs/1606.06793
Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Arbor, A., & Jagadish, H. V. (2008). Regular Expression Learning for Information Extraction. Conference on Empirical Methods in Natural Language Processing, (October), 21–30. Retrieved from http://portal.acm.org/citation.cfm?id=1613719
Liao, W., & Veeramachaneni, S. (2009). A simple semi-supervised algorithm for named entity recognition. Workshop on Semi-Supervised Learning for Natural Language Processing, (June), 58–65. http://doi.org/10.3115/1621829.1621837
Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing Named Entities in Tweets. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), 1(2008), 359–367. Retrieved from http://acl.eldoc.ub.rug.nl/mirror/P/P11/P11- 1037.pdf
Lu, Y., Ji, D., Yao, X., Wei, X., & Liang, X. (2015). CHEMDNER system with mixed conditional random fields and multi-scale word clustering. Journal of Cheminformatics, 7(Suppl 1), S4. http://doi.org/10.1186/1758-2946-7-S1-S4
Luis Eduardo, P., Iacobelli, F., & Su, S. (2015). Semi-Supervised Approach to Named Entity Recognition in Spanish Applied to a Real-World Conversational System, 224–235. http://doi.org/10.1007/978-3-319-19264-2
Luo, W., & Yang, F. (2016). An Empirical Study of Automatic Chinese Word Segmentation for Spoken Language Understanding and Named Entity Recognition, 238–248. Malanyon, D. (2009). Malay Lexical Analysis through Corpus-Based Approach. Eprints.Usm.My. Retrieved from http://eprints.usm.my/10608/
Mangasi, T., Erwin, A., & Ipung, H. P. (2014). Defined entity extraction based on Indonesian text document. In Proceedings - 2014 International Conference on ICT for Smart Society: “Smart System Platform Development for City and Society, GoeSmart 2014”, ICISS 2014 (pp. 61–65). http://doi.org/10.1109/ICTSS.2014.7013152
Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. Online, 1, 1. http://doi.org/10.1109/LPT.2009.2020494
Markov, Z., & Larose, D. T. (2007). Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. John Wiley & Sons, Inc.
Mikolov, T., Le, Q. V, & Sutskever, I. (2013). Exploiting Similarities among Languages for Machine Translation. arXiv Preprint arXiv:1309.4168v1, 1–10. Retrieved from http://arxiv.org/abs/1309.4168v1%5Cnhttp://arxiv.org/abs/1309.4168
Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., & Delen, D. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st ed. Elsevier. Oklahoma: Academic Press. http://doi.org/10.1016/B978-0-12-386979-1.00009-8
Mohamed, H., Omar, N., & Ab. Aziz, M. J. (2015). Malay Part of Speech Tagger: A Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and Multimedia, 4(1), 11–23. http://doi.org/10.17576/apjitm-2015-0401-02
Mohd Don, Z. (2010). Processing natural malay texts: A data-driven approach. Trames, 14(1), 90–103. http://doi.org/10.3176/tr.2010.1.06
Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., & Smith, N. a. (2012). Recall-oriented learning of named entities in Arabic Wikipedia. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 162–173. Retrieved from http://dl.acm.org/citation.cfm?id=2380816.2380839
Nadeau, D. (2007). A survey of named entity recognition and classification. Linguisticae Investigationes, 8(30), 3–26. http://doi.org/10.1075/li.30.1.03nad
Nogueira, T. M., Rezende, S. O., & Camargo, H. a. (2010). On the use of fuzzy rules to text document classification. Hybrid Intelligent Systems (HIS), 2010 10th International Conference on, 19–24. http://doi.org/10.1109/HIS.2010.5600076
Noh, N., Rusydi, M., Talib, A., Ahmad, A., Halim, S. A., & Mohamed, A. (2009). Malay Language Document Identification Using BPNN. In Proceedings of the 10th WSEAS international conference on Neural networks (pp. 163–168).
Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning multilingual named entity recognition from Wikipedia. Sydney: Elsevier Science. http://doi.org/10.1016/j.artint.2012.03.006
Ojo, A., & Adeyemo, A. B. (2012). Framework for Knowledge Discovery from Journal Articles Using Text Mining Techniques. African Journal of Computing & ICT, 5(2), 35–44. Retrieved from http://www.ajocict.net/uploads/Pre-print_-
_O__Ojo___A_B__Adeyemo__2012___Framework_for_Knowledge_Discovery_from_Journ al_Articles_Using_Text_Mining_Techniques.pdf
Oudah, M., & Shaalan, K. (2012). A Pipeline Arabic Named Entity Recognition using a Hybrid Approach. COLING (December 2012), 2159–2176. Retrieved from http://www.newdesign.aclweb.org/anthology/C/C12/C12-1132.pdf
Oudah, M., & Shaalan, K. (2016). Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition. Language Resources and Evaluation, 1–28. http://doi.org/10.1007/s10579-016-9376-1 Petrov, S., Das, D., & McDonald, R. (2011). A Universal Part-of-Speech Tagset. Retrieved from http://arxiv.org/abs/1104.2086
Pham, Q. H., Nguyen, M.-L., Nguyen, B. T., & Cuong, N. V. (2015). Semi-supervised Learning for Vietnamese Named Entity Recognition using Online Conditional Random Fields. In Proceedings of the Fifth Named Entity Workshop (pp. 50–55). Retrieved from http://www.aclweb.org/anthology/W15-3907
POWERS, D.M.W. (AILab, School of Computer Science, Engineering and Mathematics, Flinders University, South Australia, A. (2011). Evaluation: From Precision, Recall and FMeasure To Roc, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1), 37–63. http://doi.org/10.1.1.214.9232
Powers, D. M. W. (2015). What the F-measure doesn’t measure: Features, Flaws, Fallacies and Fixes, 19. http://doi.org/KIT-14-001
Prasad, G., Fousiya, K. K., Kumar, M. A., & Soman, K. P. (2015). Named Entity Recognition for Malayalam Language : A CRF based Approach, (May), 16–19.
Ramli, I., Jamil, N., Seman, N., & Ardi, N. (2015). An Improved Syllabification for a Better Malay Language Text-to-Speech Synthesis (TTS). 2015 IEEE International Symposium On
Robotics and Intelligent Sensors, 76 (Iris), 417–424. http://doi.org/10.1016/j.procs.2015.12.280
Rao, R. V., & Saroj, A. (2017). A self-adaptive multi-population based Jaya algorithm for engineering optimization. Swarm and Evolutionary Computation, (October 2016), 1–26. http://doi.org/10.1016/j.swevo.2017.04.008
Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named Entity Recognition in Tweets: An Experimental Study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1524–1534. Retrieved from http://dl.acm.org/citation.cfm?id=2145595
Rosso, P., Benajiba, Y., & Lyhyaoui, A. (2006, December). Towards an Arabic question answering system. In Proc. 4th Conf. on Scientific Research Outlook & Technology Development in the Arab world, SROIV, Damascus, Syria (pp. 11-14).
Rozenfeld, B., & Feldman, R. (2008). Self-supervised relation extraction from the Web. Knowledge and Information Systems, 17(1), 17–33. http://doi.org/10.1007/s10115-007-0110- 6
Sam, R. C., Le, H. T., Nguyen, T. T., & Nguyen, T. H. (2011). Combining proper namecoreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6634 LNAI (PART 1), 512–524. http://doi.org/10.1007/978-3-642-20841-6-42
Samat, N. A., Murad, M. A. A., Abdullah, M. T., & Atan, R. (2005). Malay Documents Clustering Algorithm Based on Singular Value Decomposition. Journal of Theoretical and Applied Information Technology, 180–186.
Sari, Y., Hassan, M. F., & Zamin, N. (2009). A Hybrid Approach to Semi-supervised Named Entity Recognition in Health, Safety and Environment Reports. 2009 International Conference on Future Computer and Communication, 599–602. http://doi.org/10.1109/ICFCC.2009.52
Sari, Y., Hassan, M. F., & Zamin, N. (2010). Rule-based pattern extractor and Named Entity Recognition: A hybrid approach. In Proceedings 2010 International Symposium on Information Technology - Engineering Technology, ITSim’10 (Vol. 2, pp. 563–568). http://doi.org/10.1109/ITSIM.2010.5561392
Satoshi Sekine, K. S., & Nobata, C. (2002). Extended named entity hierarchy. Third International Conference on Language Resources and Evaluation (LREC 2002), 1818–1824.
Sazali, S. S., Rahman, N. A., & Bakar, Z. A. (2017). Information extraction: Evaluating named entity recognition from classical Malay documents. In 2016 3rd International Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference Proceedings (pp. 48–53). http://doi.org/10.1109/INFRKM.2016.7806333
Seeger, M., & King, I. (2002). Learning from labeled and unlabeled data. Learning, (January), 1–62. http://doi.org/10.1109/IJCNN.2002.1007592
Sekine, S., Sudo, K., & Nobata, C. (2002, May). Extended Named Entity Hierarchy. In LREC.
Selvaperumal, P., & Suruliandi, A. (2016). Semi-Supervised Personal Name Disambiguation Technique for the Web. International Journal of Modern Education and Computer Science, 8(3), 28–36. http://doi.org/10.5815/ijmecs.2016.03.04
Servan, C., Berard, A., Elloumi, Z., Blanchon, H., & Besacier, L. (2016). Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources? Retrieved from http://arxiv.org/abs/1610.01291
Shaalan, K., & Oudah, M. (2013). A hybrid approach to Arabic named entity recognition. Journal of Information Science, 40(1), 67–87. http://doi.org/10.1177/0165551513502417
Shaalan, K., & Raza, H. (2007). Person Name Entity Recognition for Arabic. Computational Linguistics, (June), 17–24. http://doi.org/10.3115/1654576.1654581
Shabat, H. (2015). Named Entity Recognition in Crime News Documents Using Classifiers Combination, 23(6), 1215–1222. http://doi.org/10.5829/idosi.mejsr.2015.23.06.22271
Sharma, D., Devale, P. R., & Khare, A. K. (2011). Approach for Multiword Expression Identification in Natural Language Processing, 2 (August 2011), 663–666.
Sidi. (2011). Malay Interrogative Knowledge Corpus. American Journal of Economics and Business Administration, 3, 171–176. http://doi.org/10.3844/ajebasp.2011.171.176
Sinoara, R. A., Sundermann, C. V., Marcacini, R. M., Domingues, M. A., & Rezende, S. O. (2014). Named entities as privileged information for hierarchical text clustering. Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS ’14, 57–66. http://doi.org/10.1145/2628194.2628225
Srivastava, A. N., & Sahami, M. (2009). Text Mining: Classification, Clustering, and Applications. Boca Raton: Chapman and Hall/CRC.
Suakkaphong, N., Zhang, Z., & Chen, H. (2013). Disease Named Entity Recognition Using Semisupervised Learning and Conditional Random Fields. Journal of the American Society for Information Science and Technology, 14(4), 90–103. http://doi.org/10.1002/asi
Sun, a, Grishman, R., & Sekine, S. (2011). Semi-supervised relation extraction with largescale word clustering. Proceedings of the 49th Annual Meeting …, 521–529. Retrieved from http://www.aaai.org/Papers/AAAI/2007/AAAI07- 224.pdf%5Cnhttp://dl.acm.org/citation.cfm?id=2002539
Suwarningsih, W., Supriana, I., & Purwarianti, A. (2015). ImNER Indonesian medical named entity recognition. In Proceedings of 2014 2nd International Conference on Technology, Informatics, Management, Engineering and Environment, TIME-E 2014 (pp. 184–188). http://doi.org/10.1109/TIME-E.2014.7011615
Tabuchi, N., Sumii, E., & Yonezawa, A. (2003). Regular expression types for strings in a text processing language. Electronic Notes in Theoretical Computer Science, 75, 97–115. http://doi.org/10.1016/S1571-0661 (04)80781-3
Tan, T. P., Xiao, X., Tang, E. K., Chng, E. S., & Li, H. (2009). MASS: A Malay language LVCSR corpus resource. 2009 Oriental COCOSDA International Conference on Speech Database and Assessments, ICSDA 2009, 25–30. http://doi.org/10.1109/ICSDA.2009.5278382
Tran, V. C., Hwang, D., & Jung, J. J. (2015). Semi-supervised Approach Based on Cooccurrence Coefficient for Named Entity Recognition on Twitter, 141–146.
Triguero, I., García, S., & Herrera, F. (2013). Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, pp. 1–40. http://doi.org/10.1007/s10115-013-0706-y
Triguero, I., Sáez, J. A., Luengo, J., García, S., & Herrera, F. (2014). On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification. Neurocomputing, 132, 30–41. http://doi.org/10.1016/j.neucom.2013.05.055
Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based framework for text categorization. In Procedia Engineering (Vol. 69, pp. 1356–1364). Elsevier B.V. http://doi.org/10.1016/j.proeng.2014.03.129
Tuffery, S. (2011). Data Mining and Statistics for Decision Making. Wiley.
Turian, J., Ratinov, L., Bengio, Y., & Turian, J. (2010). Word Representations: A Simple and General Method for Semi-supervised Learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, (July), 384–394. http://doi.org/10.1.1.301.5840
Wibawa, A. S., & Purwarianti, A. (2016). Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning. Procedia Computer Science, 81(May), 221– 228. http://doi.org/10.1016/j.procs.2016.04.053
Witten, I. H., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques (2nd ed.). http://doi.org/citeulike-article-id:8827086
Worden, K., Staszewski, W. J., & Hensman, J. J. (2011). Natural computing for mechanical systems research: A tutorial overview. Mechanical Systems and Signal Processing. Elsevier. http://doi.org/10.1016/j.ymssp.2010.07.013
Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H.,Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems (Vol. 14). http://doi.org/10.1007/s10115-007-0114-2
Xian, B. C. M., Lubani, M., Ping, L. K., Bouzekri, K., Mahmud, R., & Lukose, D. (2016). Benchmarking Mi-POS: Malay Part-of-Speech Tagger. International Journal of Knowledge Engineering, 2(3), 115–121. http://doi.org/10.18178/ijke.2016.2.3.064
Yang, F., & Vozila, P. (2014). Semi-Supervised Chinese Word Segmentation Using Partial- Label Learning With Conditional Random Fields. Emnlp, 90–98. Retrieved from http://emnlp2014.org/papers/pdf/EMNLP2014010.pdf
Yesilbudak, M., Sagiroglu, S., & Colak, I. (2017). A novel implementation of kNN classifier based on multi-tupled meteorological input data for wind power prediction. Energy Conversion and Management, 135, 434–444. http://doi.org/10.1016/j.enconman.2016.12.094
Yong, S.-F., Ranaivo-Malan?on, B., & Wee, A. Y. (2011). NERSIL : the named-entity recognition system for Iban language. 25th Pacific Asia Conference on Language, Information and Computation, 549–558.
Yong, Z., Youwen, L., & Shixiong, X. (2009). An Improved KNN Text Classification Algorithm Based on Clustering. Journal of Computers, 4(3), 230–237. http://doi.org/10.4304/jcp.4.3.230-237
Zamin, N., & Oxley, A. (2011). Building a Corpus-Derived Gazetteer for Named Entity Recognition, 73–80.
Zamin, N., Oxley, A., Abu Bakar, Z., & Farhan, S. A. (2012). A statistical dictionary-based word alignment algorithm: An unsupervised approach. In 2012 International Conference on Computer and Information Science, ICCIS 2012 - A Conference of World Engineering, Science and Technology Congress, ESTCON 2012 - Conference Proceedings (Vol. 1, pp. 396–402). http://doi.org/10.1109/ICCISci.2012.6297278
Zatarain Salazar, J., Reed, P. M., Herman, J. D., Giuliani, M., & Castelletti, A. (2016). A diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir control. Advances in Water Resources, 92, 172–185. http://doi.org/10.1016/j.advwatres.2016.04.006
Zeng, H., Song, A., & Cheung, Y. M. (2013). Improving clustering with pairwise constraints: A discriminative approach. Knowledge and Information Systems, 36(2), 489–515. http://doi.org/10.1007/s10115-012-0592-8
Zhan, Q. (2017). An Improved K-means Algorithm Based on Structure Features, 12(1), 62–80. http://doi.org/10.17706/jsw.12.1.62-81
Zhang, C., Hong, X., & Peng, Z. (2012). An automatic approach to harvesting temporal knowledge of entity relationships. In Procedia Engineering (Vol. 29, pp. 1399–1409). http://doi.org/10.1016/j.proeng.2012.01.147
Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of Biomedical Informatics, 46(6), 1088–1098. http://doi.org/10.1016/j.jbi.2013.08.004
Zhou, D., & Zhong, D. (2015). A semi-supervised learning framework for biomedical event extraction based on hidden topics. Artificial Intelligence in Medicine, 64(1), 51–58. http://doi.org/10.1016/j.artmed.2015.03.004
Zirikly, A., & Diab, M. (2015). Named Entity Recognition for Arabic Social Media. Proceedings of NAACL-HLT 2015, 176–185. Retrieved from http://www.aclweb.org/anthology/W15-1524.pdf |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |