UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon


Browse by: Year_icon Subject Year_icon Publisher Year_icon Year
Total records found : 2
Simplified search suggestions : Farid Morsidi
12017
Article
Feature extraction using regular expression in detecting proper noun for Malay news articles based on KNN algorithm
Farid Morsidi
The identification of proper nouns from text aims to classify named entities according to their respective groupings, an aspect included in Named Entity Recognition (NER). Proper noun disambiguation can adversely affect morphological analysis, a vital trait to improve the corpus availability via classification and new word assimilation. The occurrences of proper nouns can be annotated from the text resources using separate entity mapping from their fragments. This research was carried out to examine the impact of regex on text pattern identification sequence that queried and acquired proper nouns from a collection of unannotated Malay language news articles. This basis study envisions several techniques to improve text entities precision and accuracy, such as pre-processing and data clustering. The results showed that the F-scores of the output tested on the unannotated news dataset were between 30% and 60%...

590 hits

22018
Thesis
Proper noun detection using regex algorithm and rules for malay named entity recognition
Farid Morsidi
This study was aimed to develop a Malay proper noun detection method to cluster and classify  named  entity  categories,  particularly  for  major  important  classes  such  as  person,  location,  organization,  and  miscellaneous  for  Malay  newspaper  corpus. Regular  Expression pattern identification (regex) algorithm and rule were introduced in this study to  overcome the limitation of dictionary and gazetteer.  Two visualization techniques  namely  as   Decision  Tree  and  Term  Document  Matrix  had  been  used  to evaluate the efficiency of the  method.   The result obtained 74% of accuracy during the  generation of  decision tree.    Visualization for term document matrix  achieves  a maximized value of 9.8007403, 9.8718517, and   9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively.  As a conclusion, the  regex algorithm could indicate the presence of Malay proper noun, thus making it an appropri.....

1084 hits

Filter
Loading results...



Specific Period
Loading results...



Top 5 related keywords (beta)

Loading results...



Recently Access Item




Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.