|
UPSI Digital Repository (UDRep)
|
|
|
|
||||||||||||||||||||||||||||
| Abstract : Perpustakaan Tuanku Bainun |
| The Named Entity Recognition (NER) field had been thriving for more than 15 years. NER could be defined as a process that recognizes named entities, such as the names of persons, organizations, locations, times, and quantities. The research field of NER generally emphasizes on the extraction and classification of mentions for rigid designators. This ranged from text, such as proper names, biological species, temporal expressions, and so on. NER has been utilized in many sectors, for example ranging from inquiries to morphological syntax, besides information extraction. However, most of the work had been delegated on limited domains and textual genres such as news articles and web pages. Techniques used during the processing of English text cannot be used to process Malay-related terminology. This is due to the different morphological usage of a particular language. Finding co-references and aliases in a text can be reduced to the same problem of finding all occurrences of an entity in a document. This paper proposes approaches that have been applied in the fields of NER that is in Malay, or partially related to it, in order to detect proper nouns within Malay documents. This paper also discusses the various researches done in an effort to produce high-quality training data for Malay corpus via appropriate NER algorithms and methods aside from highlighting the key points needed in improving the current NER studies.
Keywords Named entity recognition, natural language processing, Malay, fuzzy rule-based, information retrieval (IR), information extraction (IE), artificial intelligence, fuzzy relational calculus |
| References |
Abdul-Hamid, A. & Darwish, K. (2010). Simplified feature set for arabic named entity recognition. Published in Proceedings of the 2010 Named Entities Workshop, ACL 2010. Association for Computational Linguistics: Sweden, pp. 110-115.
Aboaoga, M. & Aziz, M.J. (2013). Arabic person names recognition by using a rule-based approach. Published in Journal of Computer Science 9 (7). Universiti Kebangsaan Malaysia: Bangi, pp. 922-927.
Abu Bakar, J., Omar, K., Nasrudin, M.F. & Murah, M.Z. (2013). Part-of-speech for old Malay manuscript corpus: A Review. Published in Second International Multi- Conference on Artificial Intelligence Technology, pp. 53-66.
Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic creation of Arabic named entity annotated corpus using Wikipedia. Published in Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg: Sweden, pp. 106-115.
Ananiadou, S., Pyysalo, S., Tsuji, J., & Kell, D.B. (2010). Event extraction for systems biology by text mining the literature. Published in Journal of Trends in Biotechnology, vol 28, Issue 7. Elsevier: United Kingdom, pp. 381-390.
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., & Etzioni, O. (2015). Open information extraction from the web. Published under the Grant of University of Washington. Seattle: United States of America.
Benajiba, Y., Rosso, P., & Ruiz, J.M. (2007). ANERsys: An Arabic named entity recognition system based on maximum entropy. Published in Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing. Volume 4394, pp. 143-153.
Cao, T.H., Tang, T.M., & Chau, C.K. (2012). Text clustering with named entities: A model, experimentation and realization. Published in Data Mining: Foundations and Intelligent Paradigms, Vol. 23 of Intelligent Systems Reference Library. Springer- Verlag. Berlin, Hiedelberg.
Carvalho, J.P., Batista, F., & Coheur, L. (2012). A critical survey on the use of fuzzy sets in speech and natural language processing. Journal of WCCI 2012 IEEE World Congress on Computational Intelligence, Australia.
Derczynski, L., Maynard, D., Giuseppe Rizzo, Van Erp, M., Gorrell, G., Troncy, R., Petrak, J., & Bontcheva, K. (2014). Analysis of named entity recognition and linking for tweets. Published in Journal of Information Processing & Management, Volume 51, Issue 2. University of Sheffield: United Kingdom, pp. 32-49.
Don, Z.M. (2010). Processing natural Malay texts: A data-driven approach. Trames. Published under Journal of the Humanities and Social Sciences 14(1), pp. 90-103.
Elsayed, H. & Elghazaly, T. (2015). A rule-based entities recognition system for modern standard Arabic. Published in IJCSI International Journal of Computer Science Issues, Volume 12, Issue 1, No. 2. Cairo University: Egypt.
Elsebai, A. (2009). A rule-based system for named entity recognition in modern standard Arabic. Submitted as PhD Thesis, University of Sanford: United Kingdom.
Gifu, D. & Vasilache, G. (2014). A language independent named entity recognition system. Published in the Proceedings of the 10th International Conference “Linguistic Resources and Tools for Processing the Romanian Language”. University of Craiova: Rome.
Ismail, A. (2013). Minimally supervised techniques for bilingual lexicon extraction. Submitted as PhD Thesis, University of York: United Kingdom.
Kanagavalli, V.R. & Raja, K. (2010). Detecting and resolving spatial ambiguity in text using named entity extraction and self-learning fuzzy logic techniques. Published in National Conference on Recent Trends in Data Mining and Distributed Systems. Sathyabama University: Chennai.
Kral, P. (2014). Named entities as new features for Czech document classification. Published in Journal of Computational Linguistics and Intelligent Text Processing. University of West Bohemia: Czech Republic.
Liao, J.C. (2011). A method of combining ontology and closed frequent item sets for hierarchical document Clustering. Submitted as Master Thesis. National Taiwan University of Science & Technology: Taiwan.
Mohamed, H., Omar, N., & Aziz, M.J.A (2011). Statistical Malay part-of-speech (POS) Tagger using hidden Markov Approach. Published in 2011 International Conference on Semantic Technology and Information Retrieval, IEEE.
Montalvo, S., Martinez, R., Casillas, A., & Fresno, V. (2007). Bilingual news clustering using named entities and fuzzy similarity. Published in Proceedings of the 10th International Conference on Text, Speech and Dialogue. Springer-Verlag. Berlin, Heidelberg.
Naji F. Mohammed & Omar, N. (2012). Arabic named entity recognition using artificial neural network. Journal of Computer Science. Vol. 8 (8), pp.1285-1293.
Nanda, M. (2014). The named entity recognizer framework. Published in International Journal of Innovative Research in Advanced Engineering (IJIRAE). Madhav Institute of Technology & Science: India.
Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J.R. (2013). Learning multilingual named entity recognition from Wikipedia. Published in Journal of Artificial Intelligence. University of Sydney: Australia.
Oudah, M.M. & Shaalan, K. (2012). A pipeline Arabic named recognition using a hybrid approach. Published in Proceedings of COLING 2012: Technical Papers. British University: Mumbai, pp. 2159-2176.
Patrick, J. & Nguyen, D. (2011). Automated proof reading of clinical notes. Published in 25th Pacific Asia Conference on Language, Information and Computation. PACLIC: Australia, pp. 303-312.
Powers, D.M. (2011). Evaluation: From precision, recall and F-Measure to ROC, informedness, markedness & correlation. Published in Journal of Machine Learning Technologies, 2(1). Flinders University: Australia, pp. 37-63.
Rayner, A. Mujat, & J.H. Orbit. (2013). A rule-based part of speech (RPOS) tagger for Malay text articles. Proceedings from the 5th Asian Conference on Intelligent Information and Database System (ACIIDS), vol. 2, Springer-Verlag Berlin Heidelberg, pp. 50-59.
Rayner, A., Chin Leong, L., Kim On, C., & Anthony, P. (2014). Malay named entity recognition based on rule-based approach. International Journal of Machine Learning and Computing, Vol. 4, No. 3, June 2014.
Senthil, K., Thangmani, M., & Zubair, R. (2014). Bio-inspired fuzzy expert system for mining big data. Published in Mathematical and Computational Methods in Science and Engineering. Nehru Institute of Information Technology & Management: India.
Sinoara, R., Sundermann, C.V., Marcacini, R.M., Domingues, M.A., & Rezende, S.O. (2014). Named entities as privileged information for hierarchical text clustering. Published in International Database Engineering & Applications Symposium. Portugal.
Soo-Fong, Y., Ranaivo-Malacon, B. & Alvin Yeo Wee. (2011). The named entity recognition system for Iban language. 25th Pacific Asia Conference on Language, Information & Computation. Published in PACLIC, pp. 549-558.
Turian, J., Ratinov, L., & Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning. Published in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics: Sweden, pp. 384-394.
Wang C., Liu, Y., Sun, M. (2012). Minimum error rate training for bilingual news alignment. Published in Lecture Notes in Computer Science, Chinese Lexical Semantics. Tsinghua University: Beijing.
Yu, S., Eunji, Y., Eunju, K., & Gary, G.L. (2004). POSTBIOTM-NER: A machine learning approach for bio-named entity recognition. Published in Proceedings of the EMBO Workshop on Critical Assessment of Text Mining Methods in Molecular Biology.
Zamin, N., Oxley, A., Bakar Z.A., & Farhan, S.A. (2012). A statistical dictionarybased word alignment algorithm: An unsupervised approach. Published in 2012 International Conference on Computer & Information Science (ICCIS), Kuala Lumpur, pp. 396-402.
Zhan, Z. & Sun, L. (2011). Improving word sense induction by exploiting semantic relevance. Published in the Proceedings of the 5th International Joint Conference on Natural Language Processing. AFNLP: Thailand, pp. 1387-1391. |
| This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |