Malay named entity recognition: a review

Farid Morsidi

QR Code Link :
Type :	Article
Subject :	QA Mathematics
ISSN :	2289-7844
Main Author :	Farid Morsidi
Additional Authors :	Sulaiman Sarkawi Suliana Sulaiman Siti Asma Mohammad Rohaizah Abdul Wahid
Title :	Malay named entity recognition: a review
Hits :	84

Place of Production :	Tanjong Malim
Publisher :	Fakulti Komputeran dan META-Teknologi
Year of Publication :	2015
Notes :	Vol. 2 (2015): Journal of ICT in Education (JICTIE)
Corporate Name :	Perpustakaan Tuanku Bainun
PDF Full Text :	You have no permission to view this item.

Abstract : Perpustakaan Tuanku Bainun

The Named Entity Recognition (NER) field had been thriving for more than 15 years. NER could be defined as a process that recognizes named entities, such as the names of persons, organizations, locations, times, and quantities. The research field of NER generally emphasizes on the extraction and classification of mentions for rigid designators. This ranged from text, such as proper names, biological species, temporal expressions, and so on. NER has been utilized in many sectors, for example ranging from inquiries to morphological syntax, besides information extraction. However, most of the work had been delegated on limited domains and textual genres such as news articles and web pages. Techniques used during the processing of English text cannot be used to process Malay-related terminology. This is due to the different morphological usage of a particular language. Finding co-references and aliases in a text can be reduced to the same problem of finding all occurrences of an entity in a document. This paper proposes approaches that have been applied in the fields of NER that is in Malay, or partially related to it, in order to detect proper nouns within Malay documents. This paper also discusses the various researches done in an effort to produce high-quality training data for Malay corpus via appropriate NER algorithms and methods aside from highlighting the key points needed in improving the current NER studies. Keywords Named entity recognition, natural language processing, Malay, fuzzy rule-based, information retrieval (IR), information extraction (IE), artificial intelligence, fuzzy relational calculus

References

Abdul-Hamid, A. & Darwish, K. (2010). Simplified feature set for arabic named entity

recognition. Published in Proceedings of the 2010 Named Entities Workshop, ACL

2010. Association for Computational Linguistics: Sweden, pp. 110-115.

Aboaoga, M. & Aziz, M.J. (2013). Arabic person names recognition by using a rule-based

approach. Published in Journal of Computer Science 9 (7). Universiti Kebangsaan

Malaysia: Bangi, pp. 922-927.

Abu Bakar, J., Omar, K., Nasrudin, M.F. & Murah, M.Z. (2013). Part-of-speech for

old Malay manuscript corpus: A Review. Published in Second International Multi-

Conference on Artificial Intelligence Technology, pp. 53-66.

Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic creation of Arabic named

entity annotated corpus using Wikipedia. Published in Proceedings of the Student

Research Workshop at the 14th Conference of the European Chapter of the Association for

Computational Linguistics. Gothenburg: Sweden, pp. 106-115.

Ananiadou, S., Pyysalo, S., Tsuji, J., & Kell, D.B. (2010). Event extraction for systems

biology by text mining the literature. Published in Journal of Trends in Biotechnology,

vol 28, Issue 7. Elsevier: United Kingdom, pp. 381-390.

Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., & Etzioni, O. (2015). Open

information extraction from the web. Published under the Grant of University of

Washington. Seattle: United States of America.

Benajiba, Y., Rosso, P., & Ruiz, J.M. (2007). ANERsys: An Arabic named entity

recognition system based on maximum entropy. Published in Lecture Notes in

Computer Science: Computational Linguistics and Intelligent Text Processing. Volume

4394, pp. 143-153.

Cao, T.H., Tang, T.M., & Chau, C.K. (2012). Text clustering with named entities: A

model, experimentation and realization. Published in Data Mining: Foundations and

Intelligent Paradigms, Vol. 23 of Intelligent Systems Reference Library. Springer-

Verlag. Berlin, Hiedelberg.

Carvalho, J.P., Batista, F., & Coheur, L. (2012). A critical survey on the use of fuzzy

sets in speech and natural language processing. Journal of WCCI 2012 IEEE World

Congress on Computational Intelligence, Australia.

Derczynski, L., Maynard, D., Giuseppe Rizzo, Van Erp, M., Gorrell, G., Troncy, R.,

Petrak, J., & Bontcheva, K. (2014). Analysis of named entity recognition and linking

for tweets. Published in Journal of Information Processing & Management, Volume 51,

Issue 2. University of Sheffield: United Kingdom, pp. 32-49.

Don, Z.M. (2010). Processing natural Malay texts: A data-driven approach. Trames.

Published under Journal of the Humanities and Social Sciences 14(1), pp. 90-103.

Elsayed, H. & Elghazaly, T. (2015). A rule-based entities recognition system for modern

standard Arabic. Published in IJCSI International Journal of Computer Science Issues,

Volume 12, Issue 1, No. 2. Cairo University: Egypt.

Elsebai, A. (2009). A rule-based system for named entity recognition in modern standard

Arabic. Submitted as PhD Thesis, University of Sanford: United Kingdom.

Gifu, D. & Vasilache, G. (2014). A language independent named entity recognition

system. Published in the Proceedings of the 10th International Conference “Linguistic

Resources and Tools for Processing the Romanian Language”. University of Craiova:

Rome.

Ismail, A. (2013). Minimally supervised techniques for bilingual lexicon extraction.

Submitted as PhD Thesis, University of York: United Kingdom.

Kanagavalli, V.R. & Raja, K. (2010). Detecting and resolving spatial ambiguity in text

using named entity extraction and self-learning fuzzy logic techniques. Published

in National Conference on Recent Trends in Data Mining and Distributed Systems.

Sathyabama University: Chennai.

Kral, P. (2014). Named entities as new features for Czech document classification.

Published in Journal of Computational Linguistics and Intelligent Text Processing.

University of West Bohemia: Czech Republic.

Liao, J.C. (2011). A method of combining ontology and closed frequent item sets for

hierarchical document Clustering. Submitted as Master Thesis. National Taiwan

University of Science & Technology: Taiwan.

Mohamed, H., Omar, N., & Aziz, M.J.A (2011). Statistical Malay part-of-speech (POS)

Tagger using hidden Markov Approach. Published in 2011 International Conference

on Semantic Technology and Information Retrieval, IEEE.

Montalvo, S., Martinez, R., Casillas, A., & Fresno, V. (2007). Bilingual news clustering

using named entities and fuzzy similarity. Published in Proceedings of the 10th

International Conference on Text, Speech and Dialogue. Springer-Verlag. Berlin,

Heidelberg.

Naji F. Mohammed & Omar, N. (2012). Arabic named entity recognition using artificial

neural network. Journal of Computer Science. Vol. 8 (8), pp.1285-1293.

Nanda, M. (2014). The named entity recognizer framework. Published in International

Journal of Innovative Research in Advanced Engineering (IJIRAE). Madhav Institute of

Technology & Science: India.

Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J.R. (2013). Learning

multilingual named entity recognition from Wikipedia. Published in Journal of

Artificial Intelligence. University of Sydney: Australia.

Oudah, M.M. & Shaalan, K. (2012). A pipeline Arabic named recognition using a

hybrid approach. Published in Proceedings of COLING 2012: Technical Papers.

British University: Mumbai, pp. 2159-2176.

Patrick, J. & Nguyen, D. (2011). Automated proof reading of clinical notes. Published in

25th Pacific Asia Conference on Language, Information and Computation. PACLIC:

Australia, pp. 303-312.

Powers, D.M. (2011). Evaluation: From precision, recall and F-Measure to ROC,

informedness, markedness & correlation. Published in Journal of Machine Learning

Technologies, 2(1). Flinders University: Australia, pp. 37-63.

Rayner, A. Mujat, & J.H. Orbit. (2013). A rule-based part of speech (RPOS) tagger for

Malay text articles. Proceedings from the 5th Asian Conference on Intelligent Information

and Database System (ACIIDS), vol. 2, Springer-Verlag Berlin Heidelberg, pp. 50-59.

Rayner, A., Chin Leong, L., Kim On, C., & Anthony, P. (2014). Malay named entity

recognition based on rule-based approach. International Journal of Machine Learning

and Computing, Vol. 4, No. 3, June 2014.

Senthil, K., Thangmani, M., & Zubair, R. (2014). Bio-inspired fuzzy expert system for

mining big data. Published in Mathematical and Computational Methods in Science

and Engineering. Nehru Institute of Information Technology & Management: India.

Sinoara, R., Sundermann, C.V., Marcacini, R.M., Domingues, M.A., & Rezende, S.O.

(2014). Named entities as privileged information for hierarchical text clustering.

Published in International Database Engineering & Applications Symposium. Portugal.

Soo-Fong, Y., Ranaivo-Malacon, B. & Alvin Yeo Wee. (2011). The named entity

recognition system for Iban language. 25th Pacific Asia Conference on Language,

Information & Computation. Published in PACLIC, pp. 549-558.

Turian, J., Ratinov, L., & Bengio, Y. (2010). Word representations: A simple and

general method for semi-supervised learning. Published in Proceedings of the 48th

Annual Meeting of the Association for Computational Linguistics. Association for

Computational Linguistics: Sweden, pp. 384-394.

Wang C., Liu, Y., Sun, M. (2012). Minimum error rate training for bilingual news

alignment. Published in Lecture Notes in Computer Science, Chinese Lexical Semantics.

Tsinghua University: Beijing.

Yu, S., Eunji, Y., Eunju, K., & Gary, G.L. (2004). POSTBIOTM-NER: A machine learning

approach for bio-named entity recognition. Published in Proceedings of the EMBO

Workshop on Critical Assessment of Text Mining Methods in Molecular Biology.

Zamin, N., Oxley, A., Bakar Z.A., & Farhan, S.A. (2012). A statistical dictionarybased

word alignment algorithm: An unsupervised approach. Published in 2012

International Conference on Computer & Information Science (ICCIS), Kuala Lumpur,

pp. 396-402.

Zhan, Z. & Sun, L. (2011). Improving word sense induction by exploiting semantic

relevance. Published in the Proceedings of the 5th International Joint Conference on

Natural Language Processing. AFNLP: Thailand, pp. 1387-1391.

This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to search page