UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon


Browse by: Year_icon Subject Year_icon Publisher Year_icon Year
Total records found : 3
Simplified search suggestions : Farid Morsidi
12017
Article
Feature extraction using regular expression in detecting proper noun for Malay news articles based on KNN algorithm
Farid Morsidi
The identification of proper nouns from text aims to classify named entities according to their respective groupings, an aspect included in Named Entity Recognition (NER). Proper noun disambiguation can adversely affect morphological analysis, a vital trait to improve the corpus availability via classification and new word assimilation. The occurrences of proper nouns can be annotated from the text resources using separate entity mapping from their fragments. This research was carried out to examine the impact of regex on text pattern identification sequence that queried and acquired proper nouns from a collection of unannotated Malay language news articles. This basis study envisions several techniques to improve text entities precision and accuracy, such as pre-processing and data clustering. The results showed that the F-scores of the output tested on the unannotated news dataset were between 30% and 60%...

652 hits

22018
Thesis
Proper noun detection using regex algorithm and rules for malay named entity recognition
Farid Morsidi
This study was aimed to develop a Malay proper noun detection method to cluster and classify  named  entity  categories,  particularly  for  major  important  classes  such  as  person,  location,  organization,  and  miscellaneous  for  Malay  newspaper  corpus. Regular  Expression pattern identification (regex) algorithm and rule were introduced in this study to  overcome the limitation of dictionary and gazetteer.  Two visualization techniques  namely  as   Decision  Tree  and  Term  Document  Matrix  had  been  used  to evaluate the efficiency of the  method.   The result obtained 74% of accuracy during the  generation of  decision tree.    Visualization for term document matrix  achieves  a maximized value of 9.8007403, 9.8718517, and   9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively.  As a conclusion, the  regex algorithm could indicate the presence of Malay proper noun, thus making it an appropri.....

1390 hits

32015
Article
Malay named entity recognition: a review
Farid Morsidi
The Named Entity Recognition (NER) field had been thriving for more than 15 years. NER could be defined as a process that recognizes named entities, such as the names of persons, organizations, locations, times, and quantities. The research field of NER generally emphasizes on the extraction and classification of mentions for rigid designators. This ranged from text, such as proper names, biological species, temporal expressions, and so on. NER has been utilized in many sectors, for example ranging from inquiries to morphological syntax, besides information extraction. However, most of the work had been delegated on limited domains and textual genres such as news articles and web pages. Techniques used during the processing of English text cannot be used to process Malay-related terminology. This is due to the different morphological usage of a particular language. Finding co-references and aliases in a text can be reduced to the same problem of finding all occurrences of an entity in .....

6 hits

Filter
Loading results...



Specific Period
Loading results...



Top 5 related keywords (beta)

Loading results...



Recently Access Item




Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.