UPSI Digital Repository (UDRep)
Start | FAQ | About

QR Code Link :

Type :thesis
Subject :QA Mathematics
Main Author :Farid Morsidi
Title :Proper noun detection using regex algorithm and rules for malay named entity recognition
Place of Production :Tanjong Malim
Publisher :Fakulti Seni, Komputeran dan Industri Kreatif
Year of Publication :2018
Corporate Name :Universiti Pendidikan Sultan Idris
PDF Guest :Click to view PDF file
PDF Full Text :Login required to access this item.

Abstract : Universiti Pendidikan Sultan Idris
This study was aimed to develop a Malay proper noun detection method to cluster and classify  named  entity  categories,  particularly  for  major  important  classes  such  as  person,  location,  organization,  and  miscellaneous  for  Malay  newspaper  corpus. Regular  Expression pattern identification (regex) algorithm and rule were introduced in this study to  overcome the limitation of dictionary and gazetteer.  Two visualization techniques  namely  as   Decision  Tree  and  Term  Document  Matrix  had  been  used  to evaluate the efficiency of the  method.   The result obtained 74% of accuracy during the  generation of  decision tree.    Visualization for term document matrix  achieves  a maximized value of 9.8007403, 9.8718517, and   9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively.  As a conclusion, the  regex algorithm could indicate the presence of Malay proper noun, thus making it an appropriate  method for extraction tool to cluster and classify Malay proper noun.   The study implicates that  the  use  of  Malay  proper  noun  detection  method  can  increase  the  effectiveness  in named   entity  recognition  and  beneficial  to  improve  document  retrieval  for  Malay language.  


Abdallah,   S.,   Shaalan,   K.,   &   Shoaib,   M.   (2012).   Integrating   rule-based   system   

with classification  for  arabic  named  entity  recognition.  In  Lecture  Notes  in  Computer  

Science (including   subseries   Lecture   Notes   in   Artificial   Intelligence   and   Lecture   

Notes   in Bioinformatics)  (Vol.  7181  LNCS,  pp.  311–322). 9_26


AbdelRahman,  S.,  Elarnaoty,  M.,  &  Magdy,  M.  (2010).  Integrated  Machine  Learning 

Techniques for Arabic Named Entity Recognition. International Journal of Computer Science, 7(4), 

27–36. Retrieved from


Abdul-hamid,  A.,  &  Darwish,  K.  (2010).  Simplified  Feature  Set  for  Arabic  Named  Entity 

Recognition. Proceedings of the 2010 Named Entities Workshop, (July), 110–115. Retrieved from


Abdullah, M., & Ahmad, F. (2009). Rules frequency order stemmer for malay language. … International 

        Journal         of         …,         9(2),         433–438.         Retrieved         from


Abedinpourshotorban,  H.,  Hasan,  S.,  Shamsuddin,  S.  M.,  &  As’Sahra,  N.  F.  (2016).  A 

differential-based  harmony  search  algorithm  for  the  optimization  of  continuous  problems. 

Expert Systems with Applications, 62, 317–332.


Aboaoga,  M.,  &  Aziz,  M.  J.  A.  (2013).  Arabic  person  names  recognition  by  using  a  

rule based        approach.        Journal        of        Computer        Science,        9(7),   



Abu Bakar, J., Omar, K., Nasrudin, M. F., & Murah, M. Z. (2013). Part-of-Speech for Old Malay  

Manuscript  Corpus:  A  Review.  In  Communications  in  Computer  and  Information Science (Vol. 

378 CCIS, pp. 53–66).


Abu Bakar, J., Omar, K., Nasrudin, M. F., Murah, M. Z., Al-shoukry, S., Omar, N., … Klose,

A. (2013). Processing natural malay texts: A data-driven approach. Neurocomputing, 79(3), 



Agarwal, S. K., Shah, S., & Kumar, R. (2015). Classification of mental tasks from EEG data using 

backtracking search optimization  based  neural  classifier.  Neurocomputing,  166,  397– 403.


Aggarwal, C., & Zhao, P. (2013). Towards graphical models for text processing. Knowledge and 

Information Systems, 36(1), 1–21.


Ahmad,  Z.  H.,  &  Khalifa,  O.  (2008).  Towards  designing  a  high  intelligibility  rule  

based standard Malay text-to-speech synthesis system. Proceedings of the International Conference 

on  Computer  and  Communication  Engineering  2008,  ICCCE08:  Global  Links  for  Human 

Development, 89–94.


Ahmed, Z. (2013). Named Entity Recognition and Question Answering Using Word Vectors and 



Akbari,  R.,  Hedayatzadeh,  R.,  Ziarati,  K.,  &  Hassanizadeh,  B.  (2012).  A  multi-objective 

artificial   bee   colony   algorithm.   Swarm   and   Evolutionary   Computation,   2,   39–52.


Alfred, R. (2016). Intelligent Information and Database Systems. In ACIIDS 2016, Part II (pp.



Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay Named Entity Recognition

Based on Rule-Based Approach. International Journal of Machine Learning and Computing,

4(3), 300–306.


Aljoumaa, H. (2012). Development of a Self-Learning Approach Applied to Pattern

Recognition and Fuzzy Control, (September 2012), 127.


Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature Selection

Methods Effects on Machine Learning Approaches in Malay Sentiment Analysis, (October),



Alshalabi, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of Feature

Selection and Machine Learning Methods in Automatic Malay Text Categorization.

International Conference on Electrical Engineering and Informatics (ICEEI 2013), 11(Iceei),



Al-shammaa, M., & Abbod, M. F. (2015). Automatic Generation of Fuzzy Classification

Rules from Data.


Al-shoukry, S., & Omar, N. (2015). Proper Nouns Recognition in Arabic Crime Text Using

Machine Learning Approach, 79(3), 506–513.


Althobaiti, M., Kruschwitz, U., & Poesio, M. (2015). Combining Minimally-supervised

Methods for Arabic Named Entity Recognition. Transactions of the Association for

Computational Linguistics, 3, 243–255. Retrieved from


Althobaiti, M., Kruschwitz, U., & Poesio, M. (2013). A Semi-supervised Learning Approach

to Arabic Named Entity Recognition, (September), 32–40.


Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic Creation of Arabic Named

Entity Annotated Corpus Using Wikipedia. Proceedings of the Student Research Workshop at

the 14th Conference of the European Chapter of the Association for Computational

Linguistics, 106–115. Retrieved from


Ananiadou, S., & McNaught, J. (2006). Text Mining for Biology and Biomedicine. Boston:

Artech House.


Ananiadou, S., Pyysalo, S., Tsujii, J., & Kell, D. B. (2010). Event extraction for systems

biology by text mining the literature. Trends in Biotechnology.


Ando, R. R. K., & Zhang, T. (2005). A high-performance semi-supervised learning method

for text chunking. Proceedings of the 43rd Annual Meeting on Association for Computational

Linguistics, (June), 1–9.


Baharudin, B., Lee, L. H., & Khan, K. (2010). A Review of Machine Learning Algorithms for

Text-Documents Classification. Journal of Advances in Information Technology, 1(1), 4–20.


Bali, R.-M., Chua, C. C., & Ng, P. K. (2007). Identifying and Classifying Unknown Words In

Malay Texts. The Seventh International Symposium on Natural Language Processing


(SNLP2007), 493–498. Retrieved from



Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open

Information Extraction from the Web. Proceedings of IJCAI-07, the International Joint

Conference on Artificial Intelligence, 2670–2676.


Bawane, M. S., & Gadicha, P. V. B. (n.d.). Analysing the result of GRIAS framework by

using Precision , Recall and F-measure, 24–30.


Benajiba, Y., Diab, M., & Rosso, P. (2008). Arabic named entity recognition using optimized

feature sets. EMNLP ’08 Proceedings of the Conference on Empirical Methods in Natural

Language Processing, (October), 284–293. Retrieved from


Benajiba, Y., & Rosso, P. (2008). Arabic Named Entity Recognition using Conditional

Random Fields. Proc. of Workshop on HLT & NLP within the Arabic World, LREC. Vol. 8.,

143–153. Retrieved from


Benajiba, Y., Rosso, P., & BenedíRuiz, J. (2007). ANERsys: an Arabic named entity

recognition system based on maximum entropy. Gelbukh, A. (Ed.) CICLing 2007. LNCS,

143–153. Retrieved from


Bezdek, J. C. (1993). A Physical Interpretation of Fuzzy ISODATA. Readings in Fuzzy Sets

for Intelligent Systems, (November), 615–616.


Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M. a, Maynard, D., & Aswani, N.

(2013). TwitIE : An Open-Source Information Extraction Pipeline for Microblog Text. In

Proceedings of Recent Advances in Natural Language Processing (pp. 83–90). Retrieved



Brief, T. (2005). Agreement , the F-Measure , and Reliability in Information Retrieval, 296–



Brill, E. (2000). Pattern-based disambiguation for natural language processing. Annual

Meeting of the ACL, 1. Retrieved from


Bsoul, Q., Salim, J., & Zakaria, L. Q. (2013). An Intelligent Document Clustering Approach

to Detect Crime Patterns. Procedia Technology, 11(Iceei), 1181–1187.


Cao, T. H., Tang, T. M., & Chau, C. K. (2012). Text Clustering with Named Entities: A

Model, Experimentation and Realization. Intelligent Systems Reference Library, 23, 267–287.


Carlson, A., & Betteridge, J. (2010). Coupled semi-supervised learning for information

extraction. Proceedings of the Third ACM International Conference on Web Search and Data

Mining (2010), 101–110.


Chapman, C. A. (2016). Usage and refactoring studies of python regular expressions by.

Graduate Theses and Dissertations. This, Paper 1513.


Chapman,  C.,  &  Stolee,  K.  T.  (2016).  Exploring  regular  expression  usage  and  context  in 

Python.  In  Proceedings  of  the  25th  International  Symposium  on  Software  Testing  and 

Analysis - ISSTA 2016 (pp. 282–293).


Chart, G., Algorithm, G., Tun, U., & Onn, H. (2012). Single Disciplinary Project Application Form   

     Fundamental        Research        Grant        Scheme        (FRGS),        (i),        1–16.


Che,  W.,  Wang,  M.,  Manning,  C.  D.,  &  Liu,  T.  (2013).  Named  Entity  Recognition  with 

Bilingual Constraints. Proceedings of the 2013 Conference of the North American Chapter of the 

Association for Computational Linguistics: Human Language Technologies, (June), 52–

62. Retrieved from


Chen,  K.,  Dong,  X.,  Zhu,  J.,  &  Shen,  B.  (2016).  Building  a  domain  knowledge  base  

from wikipedia:  A  semi-supervised  approach.  Proceedings  of  the  International  Conference  on 

Software       Engineering       and       Knowledge       Engineering,       SEKE,       



Chiticariu,  L.,  Krishnamurthy,  R.,  Li,  Y.,  Reiss,  F.,  &  Vaithyanathan,  S.  (2010).  

Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks.  Proceedings of the  

2010  Conference  on  Empirical  Methods  in  Natural  Language  Processing,  (October), 1002–1012. 

Retrieved from


Collobert,  R.,  Weston,  J.,  Bottou,  L.,  Karlen,  M.,  Kavukcuoglu,  K.,  &  Kuksa,  P.  

(2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 

12(Aug), 2493–2537.


Derczynski,  L.,  Maynard, D.,  Rizzo,  G.,  & Erp,  M.  Van. (n.d.).  Analysis  of  Named  Entity 

Recognition and Linking for Tweets, 1–35.


Diab, M. (2009). Second Generation AMIRA Tools for Arabic Processing?: Fast and Robust 

Tokenization,   POS   tagging,   and   Base   Phrase   Chunking.   Proceedings   of   the   Second 

International  Conference  on  Arabic  Language  Resources  and  Tools,  285–288.  Retrieved from


Duan, H., Zheng, Y., & Random, C. (2011). A Study on Features of the CRFs-based Chinese.

International Journal of Advanced Intelligence, 3(2), 287–294.


Dumais,  S.,  &  Chen,  H.  (2000).  Hierarchical  classification  of  Web  content.  SIGIR  ’00: 

Proceedings  of  the  23rd  Annual  International  ACM  SIGIR  Conference  on  Research  and 

Development in Information Retrieval, 256–263.


Ek, T., Kirkegaard, C., Jonsson, H., & Nugues, P. (2011). Named entity recognition for short text   

messages.   Procedia   -   Social   and   Behavioral   Sciences,   27(September),   178–187.


Ekbal,  A.,  & Saha,  S. (2011).  A  multiobjective  simulated  annealing approach for  classifier 

ensemble: Named entity recognition in Indian languages as case studies. Expert Systems with 

Applications, 38(12), 14760–14772.


Ekbal,  A.,  Saha,  S.,  &  Sikdar,  U.  K.  (2012).  Multiobjective  Optimization  for  Biomedical 

Named   Entity   Recognition   and   Classification.   Procedia   Technology,   6(0),   206–213.


Elsayed, H., & Elghazaly, T. (2015). A Named Entities Recognition System for Modern

Standard Arabic using Rule-Based Approach. 2015 First International Conference on Arabic

Computational Linguistics (ACLing), 12(1), 51–54.


Elsebai, a, Meziane, F., & Belkredim, F. (2009). A Rule Based Persons Names Arabic

Extraction System. Communications of the IBIMA, 11(August), 53–59. Retrieved from


Elyasir, A. M. H., Sonai, K., & Anbananthen, M. (2013). Comparison between Bag of Words

and Word Sense Disambiguation, (Icacsei), 413–417.


Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S.,… Yates,

A. (2005). Unsupervised named-entity extraction from the Web: An experimental study.

Artificial Intelligence, 165(1), 91–134.


Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & Dhalila, M. S. S. (2012). Simple

rules malay stemmer. The International Conference on Informatics and Applications

(ICIA2012), 28–35. Retrieved from



Fuchs, G., Stange, H., Samiei, A., Andrienko, G., & Andrienko, N. (2015). A semi-supervised

method for topic extraction from micro postings. Information Technology, 57(1), 49–56.


Fung, P., Fung, P., Cheung, P., & Cheung, P. (2004). Mining Very-Non-Parallel Corpora:

Parallel Sentence and Lexicon Extraction via Bootstrapping and EM. EMNLP 2004 -

Conference on Empirical Methods in Natural Language Processing, 57–63. Retrieved from


Gosselin, L., Tye-Gingras, M., & Mathieu-Potvin, F. (2009). Review of utilization of genetic

algorithms in heat transfer problems. International Journal of Heat and Mass Transfer.

Elsevier Ltd.


Goyvaerts, J., & Levithan, S. (2012). Regular Expressions Cookbook, 612.


Gunawan, Purnama, I. K. E., & Hariadi, M. (2015). Supervised learning Indonesian gloss

acquisition. IAENG International Journal of Computer Science, 42(4), 337–346.


Hassan, M., Nazlia, O., & Mohd Juzaiddin, A. A. (2015). Malay Part of Speech Tagger : A

Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and

Multimedia, 4(1), 11–23.


Hemmati, M., Amjady, N., & Ehsan, M. (2014). System modeling and optimization for

islanded micro-grid using multi-cross learning-based chaotic differential evolution algorithm.

International Journal of Electrical Power and Energy Systems, 56, 349–360.


Heydt, M. (2015). Learning pandas: Get to grips with pandas - a versatile and highperformance

Python library for data manipulation, analysis, and discovery. Retrieved from


Hu, B., Tang, B., Chen, Q., & Kang, L. (2016). A novel word embedding learning model

using the dissociation between nouns and verbs. Neurocomputing, 171, 1108–1117.


Isa, N., Puteh, M., & Kamarudin, R. M. H. R. (2013). Sentiment classification of malay

newspaper using immune network (SCIN). Lecture Notes in Engineering and Computer

Science, 3 LNECS, 1543–1548. Retrieved from



J.M., M. M. U. J. S.-C. S. M. J. G.-B. (2013). Named Entity Recognition: Fallacies challenges

and opportunities. Computer Standards and Interfaces,




Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters,

31(8), 651–666.


Kanagavalli, R. V, & K, R. (2013). Detecting and resolving spatial ambiguity in text using

named entity extraction and Self-Learning fuzzy logic techniques. Retrieved from


Kantardzic, M. (2011). Data Mining: Concepts, Models, Method, and Algorithms (2nd

Edition) (2nd ed.). New Jersey: John Wiley & Sons, Inc.


Khalaf, Z. (2015). MAHIR System: Unsupervised Segmentation for Malay Spoken Broadcast

News Stories. International Journal of Information and Electronics Engineering, 5(3).


Kondrak, S. B. and G. (2007). Alignment-Based Discriminative String Similarity.

Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,



Kraft, D. H., Martin-Bautista, M. J., Chen, J., & Sanchez, D. (2003). Rules and fuzzy rules in

text: Concept, extraction and usage. International Journal of Approximate Reasoning, 34(2–

3), 145–161.


Král, P. (2014). Named entities as new features for Czech document classification. Lecture

Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and

Lecture Notes in Bioinformatics), 8404 LNCS (PART 2), 417–427.


Kummerfeld, J., & Curran, J. (2008). Classification of Verb-Particle Constructions with the

Google Web1T Corpus. Australasian Language Technology Association Workshop 2008, 6

(December), 55–63. Retrieved from



Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields:

Probabilistic models for segmenting and labeling sequence data. ICML ’01 Proceedings of the

Eighteenth International Conference on Machine Learning, 8(June), 282–289.


Larasati, S. (2012). Towards an Indonesian-English {SMT} System: A Case Study of an

Under-Studied and Under-Resourced Language, Indonesian. {WDS}’12 Proceedings of

Contributed Papers, 123–129.


Le Nguyen, M., & Shimazu, A. (2014). A semi supervised learning model for mapping

sentences to logical forms with ambiguous supervision. In Data and Knowledge Engineering

(Vol. 90, pp. 1–12). Elsevier B.V.


Le, T., Nguyen, K., Nguyen, V., Nguyen, V., & Phung, D. (2016). Scalable Support Vector

Machine for Semi-supervised Learning, 1–18. Retrieved from


Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Arbor, A., & Jagadish, H. V.

(2008). Regular Expression Learning for Information Extraction. Conference on Empirical

Methods in Natural Language Processing, (October), 21–30. Retrieved from


Liao, W., & Veeramachaneni, S. (2009). A simple semi-supervised algorithm for named

entity recognition. Workshop on Semi-Supervised Learning for Natural Language Processing,

(June), 58–65.


Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing Named Entities in Tweets. In

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

(ACL), 1(2008), 359–367. Retrieved from



Lu, Y., Ji, D., Yao, X., Wei, X., & Liang, X. (2015). CHEMDNER system with mixed

conditional random fields and multi-scale word clustering. Journal of Cheminformatics,

7(Suppl 1), S4.


Luis Eduardo, P., Iacobelli, F., & Su, S. (2015). Semi-Supervised Approach to Named Entity

Recognition in Spanish Applied to a Real-World Conversational System, 224–235.


Luo, W., & Yang, F. (2016). An Empirical Study of Automatic Chinese Word Segmentation

for Spoken Language Understanding and Named Entity Recognition, 238–248.

Malanyon, D. (2009). Malay Lexical Analysis through Corpus-Based Approach.

Eprints.Usm.My. Retrieved from


Mangasi, T., Erwin, A., & Ipung, H. P. (2014). Defined entity extraction based on Indonesian

text document. In Proceedings - 2014 International Conference on ICT for Smart Society:

“Smart System Platform Development for City and Society, GoeSmart 2014”, ICISS 2014 (pp.



Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. Online, 1,



Markov, Z., & Larose, D. T. (2007). Data Mining the Web: Uncovering Patterns in Web

Content, Structure, and Usage. John Wiley & Sons, Inc.


Mikolov, T., Le, Q. V, & Sutskever, I. (2013). Exploiting Similarities among Languages for

Machine Translation. arXiv Preprint arXiv:1309.4168v1, 1–10. Retrieved from


Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., & Delen, D. (2012). Practical Text Mining

and Statistical Analysis for Non-structured Text Data Applications, 1st ed. Elsevier.

Oklahoma: Academic Press.


Mohamed, H., Omar, N., & Ab. Aziz, M. J. (2015). Malay Part of Speech Tagger: A

Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and

Multimedia, 4(1), 11–23.


Mohd Don, Z. (2010). Processing natural malay texts: A data-driven approach. Trames, 14(1),



Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., & Smith, N. a. (2012). Recall-oriented

learning of named entities in Arabic Wikipedia. Proceedings of the 13th Conference of the

European Chapter of the Association for Computational Linguistics, 162–173. Retrieved



Nadeau, D. (2007). A survey of named entity recognition and classification. Linguisticae

Investigationes, 8(30), 3–26.


Nogueira, T. M., Rezende, S. O., & Camargo, H. a. (2010). On the use of fuzzy rules to text

document classification. Hybrid Intelligent Systems (HIS), 2010 10th International

Conference on, 19–24.


Noh, N., Rusydi, M., Talib, A., Ahmad, A., Halim, S. A., & Mohamed, A. (2009). Malay

Language Document Identification Using BPNN. In Proceedings of the 10th WSEAS

international conference on Neural networks (pp. 163–168).


Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning

multilingual named entity recognition from Wikipedia. Sydney: Elsevier Science.


Ojo, A., & Adeyemo, A. B. (2012). Framework for Knowledge Discovery from Journal

Articles Using Text Mining Techniques. African Journal of Computing & ICT, 5(2), 35–44.

Retrieved from





Oudah, M., & Shaalan, K. (2012). A Pipeline Arabic Named Entity Recognition using a

Hybrid Approach. COLING (December 2012), 2159–2176. Retrieved from


Oudah, M., & Shaalan, K. (2016). Studying the impact of language-independent and

language-specific features on hybrid Arabic Person name recognition. Language Resources

and Evaluation, 1–28.

Petrov, S., Das, D., & McDonald, R. (2011). A Universal Part-of-Speech Tagset. Retrieved



Pham, Q. H., Nguyen, M.-L., Nguyen, B. T., & Cuong, N. V. (2015). Semi-supervised

Learning for Vietnamese Named Entity Recognition using Online Conditional Random Fields.

In Proceedings of the Fifth Named Entity Workshop (pp. 50–55). Retrieved from


POWERS, D.M.W. (AILab, School of Computer Science, Engineering and Mathematics,

Flinders University, South Australia, A. (2011). Evaluation: From Precision, Recall and FMeasure

To Roc, Informedness, Markedness & Correlation. Journal of Machine Learning

Technologies, 2(1), 37–63.


Powers, D. M. W. (2015). What the F-measure doesn’t measure: Features, Flaws, Fallacies

and Fixes, 19.


Prasad, G., Fousiya, K. K., Kumar, M. A., & Soman, K. P. (2015). Named Entity Recognition

for Malayalam Language : A CRF based Approach, (May), 16–19.


Ramli, I., Jamil, N., Seman, N., & Ardi, N. (2015). An Improved Syllabification for a Better

Malay Language Text-to-Speech Synthesis (TTS). 2015 IEEE International Symposium On


Robotics and Intelligent Sensors, 76 (Iris), 417–424.


Rao, R. V., & Saroj, A. (2017). A self-adaptive multi-population based Jaya algorithm for

engineering optimization. Swarm and Evolutionary Computation, (October 2016), 1–26.


Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named Entity Recognition in Tweets: An

Experimental Study. Proceedings of the 2011 Conference on Empirical Methods in Natural

Language Processing, 1524–1534. Retrieved from


Rosso, P., Benajiba, Y., & Lyhyaoui, A. (2006, December). Towards an Arabic question

answering system. In Proc. 4th Conf. on Scientific Research Outlook & Technology

Development in the Arab world, SROIV, Damascus, Syria (pp. 11-14).


Rozenfeld, B., & Feldman, R. (2008). Self-supervised relation extraction from the Web.

Knowledge and Information Systems, 17(1), 17–33.



Sam, R. C., Le, H. T., Nguyen, T. T., & Nguyen, T. H. (2011). Combining proper namecoreference

with conditional random fields for semi-supervised named entity recognition in

Vietnamese text. Lecture Notes in Computer Science (Including Subseries Lecture Notes in

Artificial Intelligence and Lecture Notes in Bioinformatics), 6634 LNAI (PART 1), 512–524.


Samat, N. A., Murad, M. A. A., Abdullah, M. T., & Atan, R. (2005). Malay Documents

Clustering Algorithm Based on Singular Value Decomposition. Journal of Theoretical and

Applied Information Technology, 180–186.


Sari, Y., Hassan, M. F., & Zamin, N. (2009). A Hybrid Approach to Semi-supervised Named

Entity Recognition in Health, Safety and Environment Reports. 2009 International

Conference on Future Computer and Communication, 599–602.


Sari, Y., Hassan, M. F., & Zamin, N. (2010). Rule-based pattern extractor and Named Entity

Recognition: A hybrid approach. In Proceedings 2010 International Symposium on

Information Technology - Engineering Technology, ITSim’10 (Vol. 2, pp. 563–568).


Satoshi Sekine, K. S., & Nobata, C. (2002). Extended named entity hierarchy. Third

International Conference on Language Resources and Evaluation (LREC 2002), 1818–1824.


Sazali, S. S., Rahman, N. A., & Bakar, Z. A. (2017). Information extraction: Evaluating

named entity recognition from classical Malay documents. In 2016 3rd International

Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference

Proceedings (pp. 48–53).


Seeger, M., & King, I. (2002). Learning from labeled and unlabeled data. Learning, (January),



Sekine, S., Sudo, K., & Nobata, C. (2002, May). Extended Named Entity Hierarchy. In LREC.


Selvaperumal, P., & Suruliandi, A. (2016). Semi-Supervised Personal Name Disambiguation

Technique for the Web. International Journal of Modern Education and Computer Science,

8(3), 28–36.


Servan, C., Berard, A., Elloumi, Z., Blanchon, H., & Besacier, L. (2016). Word2Vec vs

DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

Retrieved from


Shaalan, K., & Oudah, M. (2013). A hybrid approach to Arabic named entity recognition.

Journal of Information Science, 40(1), 67–87.


Shaalan, K., & Raza, H. (2007). Person Name Entity Recognition for Arabic. Computational

Linguistics, (June), 17–24.


Shabat, H. (2015). Named Entity Recognition in Crime News Documents Using Classifiers

Combination, 23(6), 1215–1222.


Sharma, D., Devale, P. R., & Khare, A. K. (2011). Approach for Multiword Expression

Identification in Natural Language Processing, 2 (August 2011), 663–666.


Sidi. (2011). Malay Interrogative Knowledge Corpus. American Journal of Economics and

Business Administration, 3, 171–176.


Sinoara, R. A., Sundermann, C. V., Marcacini, R. M., Domingues, M. A., & Rezende, S. O.

(2014). Named entities as privileged information for hierarchical text clustering. Proceedings

of the 18th International Database Engineering & Applications Symposium on - IDEAS ’14,



Srivastava, A. N., & Sahami, M. (2009). Text Mining: Classification, Clustering, and

Applications. Boca Raton: Chapman and Hall/CRC.


Suakkaphong, N., Zhang, Z., & Chen, H. (2013). Disease Named Entity Recognition Using

Semisupervised Learning and Conditional Random Fields. Journal of the American Society

for Information Science and Technology, 14(4), 90–103.


Sun, a, Grishman, R., & Sekine, S. (2011). Semi-supervised relation extraction with largescale

word clustering. Proceedings of the 49th Annual Meeting …, 521–529. Retrieved from



Suwarningsih, W., Supriana, I., & Purwarianti, A. (2015). ImNER Indonesian medical named

entity recognition. In Proceedings of 2014 2nd International Conference on Technology,

Informatics, Management, Engineering and Environment, TIME-E 2014 (pp. 184–188).


Tabuchi, N., Sumii, E., & Yonezawa, A. (2003). Regular expression types for strings in a text

processing language. Electronic Notes in Theoretical Computer Science, 75, 97–115. (04)80781-3


Tan, T. P., Xiao, X., Tang, E. K., Chng, E. S., & Li, H. (2009). MASS: A Malay language

LVCSR corpus resource. 2009 Oriental COCOSDA International Conference on Speech

Database and Assessments, ICSDA 2009, 25–30.


Tran, V. C., Hwang, D., & Jung, J. J. (2015). Semi-supervised Approach Based on Cooccurrence

Coefficient for Named Entity Recognition on Twitter, 141–146.


Triguero, I., García, S., & Herrera, F. (2013). Self-labeled techniques for semi-supervised

learning: taxonomy, software and empirical study. Knowledge and Information Systems, pp.



Triguero, I., Sáez, J. A., Luengo, J., García, S., & Herrera, F. (2014). On the characterization

of noise filters for self-training semi-supervised in nearest neighbor classification.

Neurocomputing, 132, 30–41.


Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based framework for text

categorization. In Procedia Engineering (Vol. 69, pp. 1356–1364). Elsevier B.V.


Tuffery, S. (2011). Data Mining and Statistics for Decision Making. Wiley.


Turian, J., Ratinov, L., Bengio, Y., & Turian, J. (2010). Word Representations: A Simple and

General Method for Semi-supervised Learning. Proceedings of the 48th Annual Meeting of

the Association for Computational Linguistics, (July), 384–394.


Wibawa, A. S., & Purwarianti, A. (2016). Indonesian Named-entity Recognition for 15

Classes Using Ensemble Supervised Learning. Procedia Computer Science, 81(May), 221–



Witten, I. H., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools

and Techniques (2nd ed.).


Worden, K., Staszewski, W. J., & Hensman, J. J. (2011). Natural computing for mechanical

systems research: A tutorial overview. Mechanical Systems and Signal Processing. Elsevier.


Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H.,Steinberg, D. (2008). Top

10 algorithms in data mining. Knowledge and Information Systems (Vol. 14).


Xian, B. C. M., Lubani, M., Ping, L. K., Bouzekri, K., Mahmud, R., & Lukose, D. (2016).

Benchmarking Mi-POS: Malay Part-of-Speech Tagger. International Journal of Knowledge

Engineering, 2(3), 115–121.


Yang, F., & Vozila, P. (2014). Semi-Supervised Chinese Word Segmentation Using Partial-

Label Learning With Conditional Random Fields. Emnlp, 90–98. Retrieved from


Yesilbudak, M., Sagiroglu, S., & Colak, I. (2017). A novel implementation of kNN classifier

based on multi-tupled meteorological input data for wind power prediction. Energy

Conversion and Management, 135, 434–444.


Yong, S.-F., Ranaivo-Malan?on, B., & Wee, A. Y. (2011). NERSIL : the named-entity

recognition system for Iban language. 25th Pacific Asia Conference on Language,

Information and Computation, 549–558.


Yong, Z., Youwen, L., & Shixiong, X. (2009). An Improved KNN Text Classification

Algorithm Based on Clustering. Journal of Computers, 4(3), 230–237.


Zamin, N., & Oxley, A. (2011). Building a Corpus-Derived Gazetteer for Named Entity

Recognition, 73–80.


Zamin, N., Oxley, A., Abu Bakar, Z., & Farhan, S. A. (2012). A statistical dictionary-based

word alignment algorithm: An unsupervised approach. In 2012 International Conference on

Computer and Information Science, ICCIS 2012 - A Conference of World Engineering,

Science and Technology Congress, ESTCON 2012 - Conference Proceedings (Vol. 1, pp.



Zatarain Salazar, J., Reed, P. M., Herman, J. D., Giuliani, M., & Castelletti, A. (2016). A

diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir

control. Advances in Water Resources, 92, 172–185.


Zeng, H., Song, A., & Cheung, Y. M. (2013). Improving clustering with pairwise constraints:

A discriminative approach. Knowledge and Information Systems, 36(2), 489–515.


Zhan, Q. (2017). An Improved K-means Algorithm Based on Structure Features, 12(1), 62–80.


Zhang, C., Hong, X., & Peng, Z. (2012). An automatic approach to harvesting temporal

knowledge of entity relationships. In Procedia Engineering (Vol. 29, pp. 1399–1409).


Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition:

Experiments with clinical and biological texts. Journal of Biomedical Informatics, 46(6),



Zhou, D., & Zhong, D. (2015). A semi-supervised learning framework for biomedical event

extraction based on hidden topics. Artificial Intelligence in Medicine, 64(1), 51–58.


Zirikly, A., & Diab, M. (2015). Named Entity Recognition for Arabic Social Media.

Proceedings of NAACL-HLT 2015, 176–185. Retrieved from

This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries with this repository, kindly contact us at or Whatsapp +60163630263 (Office hours only)