UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :thesis
Subject :QA Mathematics
Main Author :Farid Morsidi
Title :Proper noun detection using regex algorithm and rules for malay named entity recognition
Place of Production :Tanjong Malim
Publisher :Fakulti Seni, Komputeran dan Industri Kreatif
Year of Publication :2018
Corporate Name :Universiti Pendidikan Sultan Idris
PDF Guest :Click to view PDF file

Abstract : Universiti Pendidikan Sultan Idris
This study was aimed to develop a Malay proper noun detection method to cluster and classify  named  entity  categories,  particularly  for  major  important  classes  such  as  person,  location,  organization,  and  miscellaneous  for  Malay  newspaper  corpus. Regular  Expression pattern identification (regex) algorithm and rule were introduced in this study to  overcome the limitation of dictionary and gazetteer.  Two visualization techniques  namely  as   Decision  Tree  and  Term  Document  Matrix  had  been  used  to evaluate the efficiency of the  method.   The result obtained 74% of accuracy during the  generation of  decision tree.    Visualization for term document matrix  achieves  a maximized value of 9.8007403, 9.8718517, and   9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively.  As a conclusion, the  regex algorithm could indicate the presence of Malay proper noun, thus making it an appropriate  method for extraction tool to cluster and classify Malay proper noun.   The study implicates that  the  use  of  Malay  proper  noun  detection  method  can  increase  the  effectiveness  in named   entity  recognition  and  beneficial  to  improve  document  retrieval  for  Malay language.  

References

Abdallah,   S.,   Shaalan,   K.,   &   Shoaib,   M.   (2012).   Integrating   rule-based   system   

with classification  for  arabic  named  entity  recognition.  In  Lecture  Notes  in  Computer  

Science (including   subseries   Lecture   Notes   in   Artificial   Intelligence   and   Lecture   

Notes   in Bioinformatics)  (Vol.  7181  LNCS,  pp.  311–322).  

http://doi.org/10.1007/978-3-642-28604- 9_26

 

AbdelRahman,  S.,  Elarnaoty,  M.,  &  Magdy,  M.  (2010).  Integrated  Machine  Learning 

Techniques for Arabic Named Entity Recognition. International Journal of Computer Science, 7(4), 

27–36. Retrieved from http://ijcsi.org/papers/IJCSI-Vol-7-Issue-4-No-3.pdf#page=41

 

Abdul-hamid,  A.,  &  Darwish,  K.  (2010).  Simplified  Feature  Set  for  Arabic  Named  Entity 

Recognition. Proceedings of the 2010 Named Entities Workshop, (July), 110–115. Retrieved from 

http://www.aclweb.org/anthology/W10-2417

 

Abdullah, M., & Ahmad, F. (2009). Rules frequency order stemmer for malay language. … International 

        Journal         of         …,         9(2),         433–438.         Retrieved         from 

http://paper.ijcsns.org/07_book/200902/20090258.pdf

 

Abedinpourshotorban,  H.,  Hasan,  S.,  Shamsuddin,  S.  M.,  &  As’Sahra,  N.  F.  (2016).  A 

differential-based  harmony  search  algorithm  for  the  optimization  of  continuous  problems. 

Expert Systems with Applications, 62, 317–332. http://doi.org/10.1016/j.eswa.2016.05.013

 

Aboaoga,  M.,  &  Aziz,  M.  J.  A.  (2013).  Arabic  person  names  recognition  by  using  a  

rule based        approach.        Journal        of        Computer        Science,        9(7),   

     922–927. http://doi.org/10.3844/jcssp.2013.922.927

 

Abu Bakar, J., Omar, K., Nasrudin, M. F., & Murah, M. Z. (2013). Part-of-Speech for Old Malay  

Manuscript  Corpus:  A  Review.  In  Communications  in  Computer  and  Information Science (Vol. 

378 CCIS, pp. 53–66). http://doi.org/10.1007/978-3-642-40567-9_5

 

Abu Bakar, J., Omar, K., Nasrudin, M. F., Murah, M. Z., Al-shoukry, S., Omar, N., … Klose,

A. (2013). Processing natural malay texts: A data-driven approach. Neurocomputing, 79(3), 

2670–2676. http://doi.org/10.3176/tr.2010.1.06

 

Agarwal, S. K., Shah, S., & Kumar, R. (2015). Classification of mental tasks from EEG data using 

backtracking search optimization  based  neural  classifier.  Neurocomputing,  166,  397– 403. 

http://doi.org/10.1016/j.neucom.2015.03.041

 

Aggarwal, C., & Zhao, P. (2013). Towards graphical models for text processing. Knowledge and 

Information Systems, 36(1), 1–21. http://doi.org/10.1007/s10115-012-0552-3

 

Ahmad,  Z.  H.,  &  Khalifa,  O.  (2008).  Towards  designing  a  high  intelligibility  rule  

based standard Malay text-to-speech synthesis system. Proceedings of the International Conference 

on  Computer  and  Communication  Engineering  2008,  ICCCE08:  Global  Links  for  Human 

Development, 89–94. http://doi.org/10.1109/ICCCE.2008.4580574

 

Ahmed, Z. (2013). Named Entity Recognition and Question Answering Using Word Vectors and 

Clustering.

 

Akbari,  R.,  Hedayatzadeh,  R.,  Ziarati,  K.,  &  Hassanizadeh,  B.  (2012).  A  multi-objective 

artificial   bee   colony   algorithm.   Swarm   and   Evolutionary   Computation,   2,   39–52.

http://doi.org/10.1016/j.swevo.2011.08.001

 

Alfred, R. (2016). Intelligent Information and Database Systems. In ACIIDS 2016, Part II (pp.

447–457). http://doi.org/10.1007/978-3-642-12145-6

 

Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay Named Entity Recognition

Based on Rule-Based Approach. International Journal of Machine Learning and Computing,

4(3), 300–306. http://doi.org/10.7763/IJMLC.2014.V4.428

 

Aljoumaa, H. (2012). Development of a Self-Learning Approach Applied to Pattern

Recognition and Fuzzy Control, (September 2012), 127.

 

Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature Selection

Methods Effects on Machine Learning Approaches in Malay Sentiment Analysis, (October),

2–5.

 

Alshalabi, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of Feature

Selection and Machine Learning Methods in Automatic Malay Text Categorization.

International Conference on Electrical Engineering and Informatics (ICEEI 2013), 11(Iceei),

748–754. http://doi.org/10.1016/j.protcy.2013.12.254

 

Al-shammaa, M., & Abbod, M. F. (2015). Automatic Generation of Fuzzy Classification

Rules from Data.

 

Al-shoukry, S., & Omar, N. (2015). Proper Nouns Recognition in Arabic Crime Text Using

Machine Learning Approach, 79(3), 506–513.

 

Althobaiti, M., Kruschwitz, U., & Poesio, M. (2015). Combining Minimally-supervised

Methods for Arabic Named Entity Recognition. Transactions of the Association for

Computational Linguistics, 3, 243–255. Retrieved from

https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/564

 

Althobaiti, M., Kruschwitz, U., & Poesio, M. (2013). A Semi-supervised Learning Approach

to Arabic Named Entity Recognition, (September), 32–40.

http://doi.org/10.1177/0165551513502417

 

Althobaiti, M., Kruschwitz, U., & Poesio, M. (2014). Automatic Creation of Arabic Named

Entity Annotated Corpus Using Wikipedia. Proceedings of the Student Research Workshop at

the 14th Conference of the European Chapter of the Association for Computational

Linguistics, 106–115. Retrieved from http://www.aclweb.org/anthology/E14-3012

 

Ananiadou, S., & McNaught, J. (2006). Text Mining for Biology and Biomedicine. Boston:

Artech House.

 

Ananiadou, S., Pyysalo, S., Tsujii, J., & Kell, D. B. (2010). Event extraction for systems

biology by text mining the literature. Trends in Biotechnology.

http://doi.org/10.1016/j.tibtech.2010.04.005

 

Ando, R. R. K., & Zhang, T. (2005). A high-performance semi-supervised learning method

for text chunking. Proceedings of the 43rd Annual Meeting on Association for Computational

Linguistics, (June), 1–9. http://doi.org/10.3115/1219840.1219841

 

Baharudin, B., Lee, L. H., & Khan, K. (2010). A Review of Machine Learning Algorithms for

Text-Documents Classification. Journal of Advances in Information Technology, 1(1), 4–20.

http://doi.org/10.4304/jait.1.1.4-20

 

Bali, R.-M., Chua, C. C., & Ng, P. K. (2007). Identifying and Classifying Unknown Words In

Malay Texts. The Seventh International Symposium on Natural Language Processing

 

(SNLP2007), 493–498. Retrieved from

http://eprints.usm.my/9442/1/Identifying_and_classifying_unknown_words_in_Malay_texts.p

df%5Cnhttp://eprints.usm.my/9442/

 

Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Etzioni, O. (2007). Open

Information Extraction from the Web. Proceedings of IJCAI-07, the International Joint

Conference on Artificial Intelligence, 2670–2676. http://doi.org/10.1145/1409360.1409378

 

Bawane, M. S., & Gadicha, P. V. B. (n.d.). Analysing the result of GRIAS framework by

using Precision , Recall and F-measure, 24–30.

 

Benajiba, Y., Diab, M., & Rosso, P. (2008). Arabic named entity recognition using optimized

feature sets. EMNLP ’08 Proceedings of the Conference on Empirical Methods in Natural

Language Processing, (October), 284–293. Retrieved from

http://dl.acm.org/citation.cfm?id=1613715.1613755

 

Benajiba, Y., & Rosso, P. (2008). Arabic Named Entity Recognition using Conditional

Random Fields. Proc. of Workshop on HLT & NLP within the Arabic World, LREC. Vol. 8.,

143–153. Retrieved from

http://www.dsic.upv.es/~prosso/resources/BenajibaRosso_LREC08.pdf

 

Benajiba, Y., Rosso, P., & BenedíRuiz, J. (2007). ANERsys: an Arabic named entity

recognition system based on maximum entropy. Gelbukh, A. (Ed.) CICLing 2007. LNCS,

143–153. Retrieved from http://www.springerlink.com/index/5g6n298843878701.pdf

 

Bezdek, J. C. (1993). A Physical Interpretation of Fuzzy ISODATA. Readings in Fuzzy Sets

for Intelligent Systems, (November), 615–616. http://doi.org/10.1109/TSMC.1976.4309506

 

Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M. a, Maynard, D., & Aswani, N.

(2013). TwitIE : An Open-Source Information Extraction Pipeline for Microblog Text. In

Proceedings of Recent Advances in Natural Language Processing (pp. 83–90). Retrieved

from https://www.aclweb.org/anthology/R/R13/R13-1011.pdf

 

Brief, T. (2005). Agreement , the F-Measure , and Reliability in Information Retrieval, 296–

298. http://doi.org/10.1197/jamia.M1733.Informatics

 

Brill, E. (2000). Pattern-based disambiguation for natural language processing. Annual

Meeting of the ACL, 1. Retrieved from http://portal.acm.org/citation.cfm?id=1117795

 

Bsoul, Q., Salim, J., & Zakaria, L. Q. (2013). An Intelligent Document Clustering Approach

to Detect Crime Patterns. Procedia Technology, 11(Iceei), 1181–1187.

http://doi.org/10.1016/j.protcy.2013.12.311

 

Cao, T. H., Tang, T. M., & Chau, C. K. (2012). Text Clustering with Named Entities: A

Model, Experimentation and Realization. Intelligent Systems Reference Library, 23, 267–287.

http://doi.org/10.1007/978-3-642-23166-7_10

 

Carlson, A., & Betteridge, J. (2010). Coupled semi-supervised learning for information

extraction. Proceedings of the Third ACM International Conference on Web Search and Data

Mining (2010), 101–110. http://doi.org/10.1145/1718487.1718501

 

Chapman, C. A. (2016). Usage and refactoring studies of python regular expressions by.

Graduate Theses and Dissertations. This, Paper 1513.

 

Chapman,  C.,  &  Stolee,  K.  T.  (2016).  Exploring  regular  expression  usage  and  context  in 

Python.  In  Proceedings  of  the  25th  International  Symposium  on  Software  Testing  and 

Analysis - ISSTA 2016 (pp. 282–293). http://doi.org/10.1145/2931037.2931073

 

Chart, G., Algorithm, G., Tun, U., & Onn, H. (2012). Single Disciplinary Project Application Form   

     Fundamental        Research        Grant        Scheme        (FRGS),        (i),        1–16. 

http://doi.org/10.1155/2013/782519.(ISI-Q2).

 

Che,  W.,  Wang,  M.,  Manning,  C.  D.,  &  Liu,  T.  (2013).  Named  Entity  Recognition  with 

Bilingual Constraints. Proceedings of the 2013 Conference of the North American Chapter of the 

Association for Computational Linguistics: Human Language Technologies, (June), 52–

62. Retrieved from http://www.aclweb.org/anthology/N13-1006

 

Chen,  K.,  Dong,  X.,  Zhu,  J.,  &  Shen,  B.  (2016).  Building  a  domain  knowledge  base  

from wikipedia:  A  semi-supervised  approach.  Proceedings  of  the  International  Conference  on 

Software       Engineering       and       Knowledge       Engineering,       SEKE,       

2016–Janua. http://doi.org/10.18293/SEKE2016-051

 

Chiticariu,  L.,  Krishnamurthy,  R.,  Li,  Y.,  Reiss,  F.,  &  Vaithyanathan,  S.  (2010).  

Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks.  Proceedings of the  

2010  Conference  on  Empirical  Methods  in  Natural  Language  Processing,  (October), 1002–1012. 

Retrieved from http://portal.acm.org/citation.cfm?id=1870756

 

Collobert,  R.,  Weston,  J.,  Bottou,  L.,  Karlen,  M.,  Kavukcuoglu,  K.,  &  Kuksa,  P.  

(2011). Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research, 

12(Aug), 2493–2537. http://doi.org/10.1145/2347736.2347755

 

Derczynski,  L.,  Maynard, D.,  Rizzo,  G.,  & Erp,  M.  Van. (n.d.).  Analysis  of  Named  Entity 

Recognition and Linking for Tweets, 1–35.

 

Diab, M. (2009). Second Generation AMIRA Tools for Arabic Processing?: Fast and Robust 

Tokenization,   POS   tagging,   and   Base   Phrase   Chunking.   Proceedings   of   the   Second 

International  Conference  on  Arabic  Language  Resources  and  Tools,  285–288.  Retrieved from 

http://www.elda.org/medar-conference/pdf/56.pdf

 

Duan, H., Zheng, Y., & Random, C. (2011). A Study on Features of the CRFs-based Chinese.

International Journal of Advanced Intelligence, 3(2), 287–294.

 

Dumais,  S.,  &  Chen,  H.  (2000).  Hierarchical  classification  of  Web  content.  SIGIR  ’00: 

Proceedings  of  the  23rd  Annual  International  ACM  SIGIR  Conference  on  Research  and 

Development in Information Retrieval, 256–263. http://doi.org/10.1145/345508.345593

 

Ek, T., Kirkegaard, C., Jonsson, H., & Nugues, P. (2011). Named entity recognition for short text   

messages.   Procedia   -   Social   and   Behavioral   Sciences,   27(September),   178–187. 

http://doi.org/10.1016/j.sbspro.2011.10.596

 

Ekbal,  A.,  & Saha,  S. (2011).  A  multiobjective  simulated  annealing approach for  classifier 

ensemble: Named entity recognition in Indian languages as case studies. Expert Systems with 

Applications, 38(12), 14760–14772. http://doi.org/10.1016/j.eswa.2011.05.004

 

Ekbal,  A.,  Saha,  S.,  &  Sikdar,  U.  K.  (2012).  Multiobjective  Optimization  for  Biomedical 

Named   Entity   Recognition   and   Classification.   Procedia   Technology,   6(0),   206–213.

http://doi.org/http://dx.doi.org/10.1016/j.protcy.2012.10.025

 

Elsayed, H., & Elghazaly, T. (2015). A Named Entities Recognition System for Modern

Standard Arabic using Rule-Based Approach. 2015 First International Conference on Arabic

Computational Linguistics (ACLing), 12(1), 51–54. http://doi.org/10.1109/ACLing.2015.14

 

Elsebai, a, Meziane, F., & Belkredim, F. (2009). A Rule Based Persons Names Arabic

Extraction System. Communications of the IBIMA, 11(August), 53–59. Retrieved from

http://usir.salford.ac.uk/2206/

 

Elyasir, A. M. H., Sonai, K., & Anbananthen, M. (2013). Comparison between Bag of Words

and Word Sense Disambiguation, (Icacsei), 413–417.

 

Etzioni, O., Cafarella, M., Downey, D., Popescu, A. M., Shaked, T., Soderland, S.,… Yates,

A. (2005). Unsupervised named-entity extraction from the Web: An experimental study.

Artificial Intelligence, 165(1), 91–134. http://doi.org/10.1016/j.artint.2005.03.001

 

Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & Dhalila, M. S. S. (2012). Simple

rules malay stemmer. The International Conference on Informatics and Applications

(ICIA2012), 28–35. Retrieved from http://sdiwc.net/digitallibrary/

download.php?id=00000187.pdf

 

Fuchs, G., Stange, H., Samiei, A., Andrienko, G., & Andrienko, N. (2015). A semi-supervised

method for topic extraction from micro postings. Information Technology, 57(1), 49–56.

http://doi.org/10.1515/itit-2014-1078

 

Fung, P., Fung, P., Cheung, P., & Cheung, P. (2004). Mining Very-Non-Parallel Corpora:

Parallel Sentence and Lexicon Extraction via Bootstrapping and EM. EMNLP 2004 -

Conference on Empirical Methods in Natural Language Processing, 57–63. Retrieved from

http://www.aclweb.org/anthology-new/W/W04/W04-3208.pdf

 

Gosselin, L., Tye-Gingras, M., & Mathieu-Potvin, F. (2009). Review of utilization of genetic

algorithms in heat transfer problems. International Journal of Heat and Mass Transfer.

Elsevier Ltd. http://doi.org/10.1016/j.ijheatmasstransfer.2008.11.015

 

Goyvaerts, J., & Levithan, S. (2012). Regular Expressions Cookbook, 612.

http://doi.org/9780596802837

 

Gunawan, Purnama, I. K. E., & Hariadi, M. (2015). Supervised learning Indonesian gloss

acquisition. IAENG International Journal of Computer Science, 42(4), 337–346.

 

Hassan, M., Nazlia, O., & Mohd Juzaiddin, A. A. (2015). Malay Part of Speech Tagger : A

Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and

Multimedia, 4(1), 11–23. http://doi.org/10.17576/apjitm-2015-0401-02

 

Hemmati, M., Amjady, N., & Ehsan, M. (2014). System modeling and optimization for

islanded micro-grid using multi-cross learning-based chaotic differential evolution algorithm.

International Journal of Electrical Power and Energy Systems, 56, 349–360.

http://doi.org/10.1016/j.ijepes.2013.11.015

 

Heydt, M. (2015). Learning pandas: Get to grips with pandas - a versatile and highperformance

Python library for data manipulation, analysis, and discovery. Retrieved from

http://gen.lib.rus.ec/book/index.php?md5=75566423DC8A5A9411165F24EF9DD886

 

Hu, B., Tang, B., Chen, Q., & Kang, L. (2016). A novel word embedding learning model

using the dissociation between nouns and verbs. Neurocomputing, 171, 1108–1117.

http://doi.org/10.1016/j.neucom.2015.07.046

 

Isa, N., Puteh, M., & Kamarudin, R. M. H. R. (2013). Sentiment classification of malay

newspaper using immune network (SCIN). Lecture Notes in Engineering and Computer

Science, 3 LNECS, 1543–1548. Retrieved from

http://www.scopus.com/inward/record.url?eid=2-s2.0-

84887882006&partnerID=40&md5=652fdc713458c4dfedcbc4e3a0b736b6

 

J.M., M. M. U. J. S.-C. S. M. J. G.-B. (2013). Named Entity Recognition: Fallacies challenges

and opportunities. Computer Standards and Interfaces,

3554824891(http://www.scopus.com/inward/record.url?eid=2-s2.0-

84878302542&partnerID=40&md5=fa0cc4fcfad6db514533c129e08333d6).

 

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters,

31(8), 651–666. http://doi.org/10.1016/j.patrec.2009.09.011

 

Kanagavalli, R. V, & K, R. (2013). Detecting and resolving spatial ambiguity in text using

named entity extraction and Self-Learning fuzzy logic techniques. Retrieved from

http://arxiv.org/abs/1303.0445

 

Kantardzic, M. (2011). Data Mining: Concepts, Models, Method, and Algorithms (2nd

Edition) (2nd ed.). New Jersey: John Wiley & Sons, Inc.

 

Khalaf, Z. (2015). MAHIR System: Unsupervised Segmentation for Malay Spoken Broadcast

News Stories. International Journal of Information and Electronics Engineering, 5(3).

http://doi.org/10.7763/IJIEE.2015.V5.532

 

Kondrak, S. B. and G. (2007). Alignment-Based Discriminative String Similarity.

Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics,

656–663.

 

Kraft, D. H., Martin-Bautista, M. J., Chen, J., & Sanchez, D. (2003). Rules and fuzzy rules in

text: Concept, extraction and usage. International Journal of Approximate Reasoning, 34(2–

3), 145–161. http://doi.org/10.1016/j.ijar.2003.07.005

 

Král, P. (2014). Named entities as new features for Czech document classification. Lecture

Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and

Lecture Notes in Bioinformatics), 8404 LNCS (PART 2), 417–427.

http://doi.org/10.1007/978-3-642-54903-8_35

 

Kummerfeld, J., & Curran, J. (2008). Classification of Verb-Particle Constructions with the

Google Web1T Corpus. Australasian Language Technology Association Workshop 2008, 6

(December), 55–63. Retrieved from http://aclweb.org/anthology-new/U/U08/U08-

1.pdf#page=114

 

Lafferty, J., McCallum, A., & Pereira, F. C. N. (2001). Conditional random fields:

Probabilistic models for segmenting and labeling sequence data. ICML ’01 Proceedings of the

Eighteenth International Conference on Machine Learning, 8(June), 282–289.

http://doi.org/10.1038/nprot.2006.61

 

Larasati, S. (2012). Towards an Indonesian-English {SMT} System: A Case Study of an

Under-Studied and Under-Resourced Language, Indonesian. {WDS}’12 Proceedings of

Contributed Papers, 123–129.

 

Le Nguyen, M., & Shimazu, A. (2014). A semi supervised learning model for mapping

sentences to logical forms with ambiguous supervision. In Data and Knowledge Engineering

(Vol. 90, pp. 1–12). Elsevier B.V. http://doi.org/10.1016/j.datak.2013.12.001

 

Le, T., Nguyen, K., Nguyen, V., Nguyen, V., & Phung, D. (2016). Scalable Support Vector

Machine for Semi-supervised Learning, 1–18. Retrieved from http://arxiv.org/abs/1606.06793

 

Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Arbor, A., & Jagadish, H. V.

(2008). Regular Expression Learning for Information Extraction. Conference on Empirical

Methods in Natural Language Processing, (October), 21–30. Retrieved from

http://portal.acm.org/citation.cfm?id=1613719

 

Liao, W., & Veeramachaneni, S. (2009). A simple semi-supervised algorithm for named

entity recognition. Workshop on Semi-Supervised Learning for Natural Language Processing,

(June), 58–65. http://doi.org/10.3115/1621829.1621837

 

Liu, X., Zhang, S., Wei, F., & Zhou, M. (2011). Recognizing Named Entities in Tweets. In

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

(ACL), 1(2008), 359–367. Retrieved from http://acl.eldoc.ub.rug.nl/mirror/P/P11/P11-

1037.pdf

 

Lu, Y., Ji, D., Yao, X., Wei, X., & Liang, X. (2015). CHEMDNER system with mixed

conditional random fields and multi-scale word clustering. Journal of Cheminformatics,

7(Suppl 1), S4. http://doi.org/10.1186/1758-2946-7-S1-S4

 

Luis Eduardo, P., Iacobelli, F., & Su, S. (2015). Semi-Supervised Approach to Named Entity

Recognition in Spanish Applied to a Real-World Conversational System, 224–235.

http://doi.org/10.1007/978-3-319-19264-2

 

Luo, W., & Yang, F. (2016). An Empirical Study of Automatic Chinese Word Segmentation

for Spoken Language Understanding and Named Entity Recognition, 238–248.

Malanyon, D. (2009). Malay Lexical Analysis through Corpus-Based Approach.

Eprints.Usm.My. Retrieved from http://eprints.usm.my/10608/

 

Mangasi, T., Erwin, A., & Ipung, H. P. (2014). Defined entity extraction based on Indonesian

text document. In Proceedings - 2014 International Conference on ICT for Smart Society:

“Smart System Platform Development for City and Society, GoeSmart 2014”, ICISS 2014 (pp.

61–65). http://doi.org/10.1109/ICTSS.2014.7013152

 

Manning, C. D., & Raghavan, P. (2009). An Introduction to Information Retrieval. Online, 1,

1. http://doi.org/10.1109/LPT.2009.2020494

 

Markov, Z., & Larose, D. T. (2007). Data Mining the Web: Uncovering Patterns in Web

Content, Structure, and Usage. John Wiley & Sons, Inc.

 

Mikolov, T., Le, Q. V, & Sutskever, I. (2013). Exploiting Similarities among Languages for

Machine Translation. arXiv Preprint arXiv:1309.4168v1, 1–10. Retrieved from

http://arxiv.org/abs/1309.4168v1%5Cnhttp://arxiv.org/abs/1309.4168

 

Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., & Delen, D. (2012). Practical Text Mining

and Statistical Analysis for Non-structured Text Data Applications, 1st ed. Elsevier.

Oklahoma: Academic Press. http://doi.org/10.1016/B978-0-12-386979-1.00009-8

 

Mohamed, H., Omar, N., & Ab. Aziz, M. J. (2015). Malay Part of Speech Tagger: A

Comparative Study on Tagging Tools. Asia-Pacific Journal of Information Technology and

Multimedia, 4(1), 11–23. http://doi.org/10.17576/apjitm-2015-0401-02

 

Mohd Don, Z. (2010). Processing natural malay texts: A data-driven approach. Trames, 14(1),

90–103. http://doi.org/10.3176/tr.2010.1.06

 

Mohit, B., Schneider, N., Bhowmick, R., Oflazer, K., & Smith, N. a. (2012). Recall-oriented

learning of named entities in Arabic Wikipedia. Proceedings of the 13th Conference of the

European Chapter of the Association for Computational Linguistics, 162–173. Retrieved

from http://dl.acm.org/citation.cfm?id=2380816.2380839

 

Nadeau, D. (2007). A survey of named entity recognition and classification. Linguisticae

Investigationes, 8(30), 3–26. http://doi.org/10.1075/li.30.1.03nad

 

Nogueira, T. M., Rezende, S. O., & Camargo, H. a. (2010). On the use of fuzzy rules to text

document classification. Hybrid Intelligent Systems (HIS), 2010 10th International

Conference on, 19–24. http://doi.org/10.1109/HIS.2010.5600076

 

Noh, N., Rusydi, M., Talib, A., Ahmad, A., Halim, S. A., & Mohamed, A. (2009). Malay

Language Document Identification Using BPNN. In Proceedings of the 10th WSEAS

international conference on Neural networks (pp. 163–168).

 

Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning

multilingual named entity recognition from Wikipedia. Sydney: Elsevier Science.

http://doi.org/10.1016/j.artint.2012.03.006

 

Ojo, A., & Adeyemo, A. B. (2012). Framework for Knowledge Discovery from Journal

Articles Using Text Mining Techniques. African Journal of Computing & ICT, 5(2), 35–44.

Retrieved from http://www.ajocict.net/uploads/Pre-print_-

 

_O__Ojo___A_B__Adeyemo__2012___Framework_for_Knowledge_Discovery_from_Journ

al_Articles_Using_Text_Mining_Techniques.pdf

 

Oudah, M., & Shaalan, K. (2012). A Pipeline Arabic Named Entity Recognition using a

Hybrid Approach. COLING (December 2012), 2159–2176. Retrieved from

http://www.newdesign.aclweb.org/anthology/C/C12/C12-1132.pdf

 

Oudah, M., & Shaalan, K. (2016). Studying the impact of language-independent and

language-specific features on hybrid Arabic Person name recognition. Language Resources

and Evaluation, 1–28. http://doi.org/10.1007/s10579-016-9376-1

Petrov, S., Das, D., & McDonald, R. (2011). A Universal Part-of-Speech Tagset. Retrieved

from http://arxiv.org/abs/1104.2086

 

Pham, Q. H., Nguyen, M.-L., Nguyen, B. T., & Cuong, N. V. (2015). Semi-supervised

Learning for Vietnamese Named Entity Recognition using Online Conditional Random Fields.

In Proceedings of the Fifth Named Entity Workshop (pp. 50–55). Retrieved from

http://www.aclweb.org/anthology/W15-3907

 

POWERS, D.M.W. (AILab, School of Computer Science, Engineering and Mathematics,

Flinders University, South Australia, A. (2011). Evaluation: From Precision, Recall and FMeasure

To Roc, Informedness, Markedness & Correlation. Journal of Machine Learning

Technologies, 2(1), 37–63. http://doi.org/10.1.1.214.9232

 

Powers, D. M. W. (2015). What the F-measure doesn’t measure: Features, Flaws, Fallacies

and Fixes, 19. http://doi.org/KIT-14-001

 

Prasad, G., Fousiya, K. K., Kumar, M. A., & Soman, K. P. (2015). Named Entity Recognition

for Malayalam Language : A CRF based Approach, (May), 16–19.

 

Ramli, I., Jamil, N., Seman, N., & Ardi, N. (2015). An Improved Syllabification for a Better

Malay Language Text-to-Speech Synthesis (TTS). 2015 IEEE International Symposium On

 

Robotics and Intelligent Sensors, 76 (Iris), 417–424.

http://doi.org/10.1016/j.procs.2015.12.280

 

Rao, R. V., & Saroj, A. (2017). A self-adaptive multi-population based Jaya algorithm for

engineering optimization. Swarm and Evolutionary Computation, (October 2016), 1–26.

http://doi.org/10.1016/j.swevo.2017.04.008

 

Ritter, A., Clark, S., Mausam, & Etzioni, O. (2011). Named Entity Recognition in Tweets: An

Experimental Study. Proceedings of the 2011 Conference on Empirical Methods in Natural

Language Processing, 1524–1534. Retrieved from http://dl.acm.org/citation.cfm?id=2145595

 

Rosso, P., Benajiba, Y., & Lyhyaoui, A. (2006, December). Towards an Arabic question

answering system. In Proc. 4th Conf. on Scientific Research Outlook & Technology

Development in the Arab world, SROIV, Damascus, Syria (pp. 11-14).

 

Rozenfeld, B., & Feldman, R. (2008). Self-supervised relation extraction from the Web.

Knowledge and Information Systems, 17(1), 17–33. http://doi.org/10.1007/s10115-007-0110-

6

 

Sam, R. C., Le, H. T., Nguyen, T. T., & Nguyen, T. H. (2011). Combining proper namecoreference

with conditional random fields for semi-supervised named entity recognition in

Vietnamese text. Lecture Notes in Computer Science (Including Subseries Lecture Notes in

Artificial Intelligence and Lecture Notes in Bioinformatics), 6634 LNAI (PART 1), 512–524.

http://doi.org/10.1007/978-3-642-20841-6-42

 

Samat, N. A., Murad, M. A. A., Abdullah, M. T., & Atan, R. (2005). Malay Documents

Clustering Algorithm Based on Singular Value Decomposition. Journal of Theoretical and

Applied Information Technology, 180–186.

 

Sari, Y., Hassan, M. F., & Zamin, N. (2009). A Hybrid Approach to Semi-supervised Named

Entity Recognition in Health, Safety and Environment Reports. 2009 International

Conference on Future Computer and Communication, 599–602.

http://doi.org/10.1109/ICFCC.2009.52

 

Sari, Y., Hassan, M. F., & Zamin, N. (2010). Rule-based pattern extractor and Named Entity

Recognition: A hybrid approach. In Proceedings 2010 International Symposium on

Information Technology - Engineering Technology, ITSim’10 (Vol. 2, pp. 563–568).

http://doi.org/10.1109/ITSIM.2010.5561392

 

Satoshi Sekine, K. S., & Nobata, C. (2002). Extended named entity hierarchy. Third

International Conference on Language Resources and Evaluation (LREC 2002), 1818–1824.

 

Sazali, S. S., Rahman, N. A., & Bakar, Z. A. (2017). Information extraction: Evaluating

named entity recognition from classical Malay documents. In 2016 3rd International

Conference on Information Retrieval and Knowledge Management, CAMP 2016 - Conference

Proceedings (pp. 48–53). http://doi.org/10.1109/INFRKM.2016.7806333

 

Seeger, M., & King, I. (2002). Learning from labeled and unlabeled data. Learning, (January),

1–62. http://doi.org/10.1109/IJCNN.2002.1007592

 

Sekine, S., Sudo, K., & Nobata, C. (2002, May). Extended Named Entity Hierarchy. In LREC.

 

Selvaperumal, P., & Suruliandi, A. (2016). Semi-Supervised Personal Name Disambiguation

Technique for the Web. International Journal of Modern Education and Computer Science,

8(3), 28–36. http://doi.org/10.5815/ijmecs.2016.03.04

 

Servan, C., Berard, A., Elloumi, Z., Blanchon, H., & Besacier, L. (2016). Word2Vec vs

DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

Retrieved from http://arxiv.org/abs/1610.01291

 

Shaalan, K., & Oudah, M. (2013). A hybrid approach to Arabic named entity recognition.

Journal of Information Science, 40(1), 67–87. http://doi.org/10.1177/0165551513502417

 

Shaalan, K., & Raza, H. (2007). Person Name Entity Recognition for Arabic. Computational

Linguistics, (June), 17–24. http://doi.org/10.3115/1654576.1654581

 

Shabat, H. (2015). Named Entity Recognition in Crime News Documents Using Classifiers

Combination, 23(6), 1215–1222. http://doi.org/10.5829/idosi.mejsr.2015.23.06.22271

 

Sharma, D., Devale, P. R., & Khare, A. K. (2011). Approach for Multiword Expression

Identification in Natural Language Processing, 2 (August 2011), 663–666.

 

Sidi. (2011). Malay Interrogative Knowledge Corpus. American Journal of Economics and

Business Administration, 3, 171–176. http://doi.org/10.3844/ajebasp.2011.171.176

 

Sinoara, R. A., Sundermann, C. V., Marcacini, R. M., Domingues, M. A., & Rezende, S. O.

(2014). Named entities as privileged information for hierarchical text clustering. Proceedings

of the 18th International Database Engineering & Applications Symposium on - IDEAS ’14,

57–66. http://doi.org/10.1145/2628194.2628225

 

Srivastava, A. N., & Sahami, M. (2009). Text Mining: Classification, Clustering, and

Applications. Boca Raton: Chapman and Hall/CRC.

 

Suakkaphong, N., Zhang, Z., & Chen, H. (2013). Disease Named Entity Recognition Using

Semisupervised Learning and Conditional Random Fields. Journal of the American Society

for Information Science and Technology, 14(4), 90–103. http://doi.org/10.1002/asi

 

Sun, a, Grishman, R., & Sekine, S. (2011). Semi-supervised relation extraction with largescale

word clustering. Proceedings of the 49th Annual Meeting …, 521–529. Retrieved from

http://www.aaai.org/Papers/AAAI/2007/AAAI07-

224.pdf%5Cnhttp://dl.acm.org/citation.cfm?id=2002539

 

Suwarningsih, W., Supriana, I., & Purwarianti, A. (2015). ImNER Indonesian medical named

entity recognition. In Proceedings of 2014 2nd International Conference on Technology,

Informatics, Management, Engineering and Environment, TIME-E 2014 (pp. 184–188).

http://doi.org/10.1109/TIME-E.2014.7011615

 

Tabuchi, N., Sumii, E., & Yonezawa, A. (2003). Regular expression types for strings in a text

processing language. Electronic Notes in Theoretical Computer Science, 75, 97–115.

http://doi.org/10.1016/S1571-0661 (04)80781-3

 

Tan, T. P., Xiao, X., Tang, E. K., Chng, E. S., & Li, H. (2009). MASS: A Malay language

LVCSR corpus resource. 2009 Oriental COCOSDA International Conference on Speech

Database and Assessments, ICSDA 2009, 25–30.

http://doi.org/10.1109/ICSDA.2009.5278382

 

Tran, V. C., Hwang, D., & Jung, J. J. (2015). Semi-supervised Approach Based on Cooccurrence

Coefficient for Named Entity Recognition on Twitter, 141–146.

 

Triguero, I., García, S., & Herrera, F. (2013). Self-labeled techniques for semi-supervised

learning: taxonomy, software and empirical study. Knowledge and Information Systems, pp.

1–40. http://doi.org/10.1007/s10115-013-0706-y

 

Triguero, I., Sáez, J. A., Luengo, J., García, S., & Herrera, F. (2014). On the characterization

of noise filters for self-training semi-supervised in nearest neighbor classification.

Neurocomputing, 132, 30–41. http://doi.org/10.1016/j.neucom.2013.05.055

 

Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based framework for text

categorization. In Procedia Engineering (Vol. 69, pp. 1356–1364). Elsevier B.V.

http://doi.org/10.1016/j.proeng.2014.03.129

 

Tuffery, S. (2011). Data Mining and Statistics for Decision Making. Wiley.

 

Turian, J., Ratinov, L., Bengio, Y., & Turian, J. (2010). Word Representations: A Simple and

General Method for Semi-supervised Learning. Proceedings of the 48th Annual Meeting of

the Association for Computational Linguistics, (July), 384–394.

http://doi.org/10.1.1.301.5840

 

Wibawa, A. S., & Purwarianti, A. (2016). Indonesian Named-entity Recognition for 15

Classes Using Ensemble Supervised Learning. Procedia Computer Science, 81(May), 221–

228. http://doi.org/10.1016/j.procs.2016.04.053

 

Witten, I. H., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools

and Techniques (2nd ed.). http://doi.org/citeulike-article-id:8827086

 

Worden, K., Staszewski, W. J., & Hensman, J. J. (2011). Natural computing for mechanical

systems research: A tutorial overview. Mechanical Systems and Signal Processing. Elsevier.

http://doi.org/10.1016/j.ymssp.2010.07.013

 

Wu, X., Kumar, V., Ross, Q. J., Ghosh, J., Yang, Q., Motoda, H.,Steinberg, D. (2008). Top

10 algorithms in data mining. Knowledge and Information Systems (Vol. 14).

http://doi.org/10.1007/s10115-007-0114-2

 

Xian, B. C. M., Lubani, M., Ping, L. K., Bouzekri, K., Mahmud, R., & Lukose, D. (2016).

Benchmarking Mi-POS: Malay Part-of-Speech Tagger. International Journal of Knowledge

Engineering, 2(3), 115–121. http://doi.org/10.18178/ijke.2016.2.3.064

 

Yang, F., & Vozila, P. (2014). Semi-Supervised Chinese Word Segmentation Using Partial-

Label Learning With Conditional Random Fields. Emnlp, 90–98. Retrieved from

http://emnlp2014.org/papers/pdf/EMNLP2014010.pdf

 

Yesilbudak, M., Sagiroglu, S., & Colak, I. (2017). A novel implementation of kNN classifier

based on multi-tupled meteorological input data for wind power prediction. Energy

Conversion and Management, 135, 434–444. http://doi.org/10.1016/j.enconman.2016.12.094

 

Yong, S.-F., Ranaivo-Malan?on, B., & Wee, A. Y. (2011). NERSIL : the named-entity

recognition system for Iban language. 25th Pacific Asia Conference on Language,

Information and Computation, 549–558.

 

Yong, Z., Youwen, L., & Shixiong, X. (2009). An Improved KNN Text Classification

Algorithm Based on Clustering. Journal of Computers, 4(3), 230–237.

http://doi.org/10.4304/jcp.4.3.230-237

 

Zamin, N., & Oxley, A. (2011). Building a Corpus-Derived Gazetteer for Named Entity

Recognition, 73–80.

 

Zamin, N., Oxley, A., Abu Bakar, Z., & Farhan, S. A. (2012). A statistical dictionary-based

word alignment algorithm: An unsupervised approach. In 2012 International Conference on

Computer and Information Science, ICCIS 2012 - A Conference of World Engineering,

Science and Technology Congress, ESTCON 2012 - Conference Proceedings (Vol. 1, pp.

396–402). http://doi.org/10.1109/ICCISci.2012.6297278

 

Zatarain Salazar, J., Reed, P. M., Herman, J. D., Giuliani, M., & Castelletti, A. (2016). A

diagnostic assessment of evolutionary algorithms for multi-objective surface water reservoir

control. Advances in Water Resources, 92, 172–185.

http://doi.org/10.1016/j.advwatres.2016.04.006

 

Zeng, H., Song, A., & Cheung, Y. M. (2013). Improving clustering with pairwise constraints:

A discriminative approach. Knowledge and Information Systems, 36(2), 489–515.

http://doi.org/10.1007/s10115-012-0592-8

 

Zhan, Q. (2017). An Improved K-means Algorithm Based on Structure Features, 12(1), 62–80.

http://doi.org/10.17706/jsw.12.1.62-81

 

Zhang, C., Hong, X., & Peng, Z. (2012). An automatic approach to harvesting temporal

knowledge of entity relationships. In Procedia Engineering (Vol. 29, pp. 1399–1409).

http://doi.org/10.1016/j.proeng.2012.01.147

 

Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition:

Experiments with clinical and biological texts. Journal of Biomedical Informatics, 46(6),

1088–1098. http://doi.org/10.1016/j.jbi.2013.08.004

 

Zhou, D., & Zhong, D. (2015). A semi-supervised learning framework for biomedical event

extraction based on hidden topics. Artificial Intelligence in Medicine, 64(1), 51–58.

http://doi.org/10.1016/j.artmed.2015.03.004

 

Zirikly, A., & Diab, M. (2015). Named Entity Recognition for Arabic Social Media.

Proceedings of NAACL-HLT 2015, 176–185. Retrieved from

http://www.aclweb.org/anthology/W15-1524.pdf


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.