|
UPSI Digital Repository (UDRep)
|
|
|
| Total records found : 2 |
| Simplified search suggestions : Farid Morsidi |
| 1 | 2017 Article | Feature extraction using regular expression in detecting proper noun for Malay news articles based on KNN algorithm Farid Morsidi The identification of proper nouns from text aims to classify named entities according to their respective groupings, an aspect included in Named Entity Recognition (NER). Proper noun disambiguation can adversely affect morphological analysis, a vital trait to improve the corpus availability via classification and new word assimilation. The occurrences of proper nouns can be annotated from the text resources using separate entity mapping from their fragments. This research was carried out to examine the impact of regex on text pattern identification sequence that queried and acquired proper nouns from a collection of unannotated Malay language news articles. This basis study envisions several techniques to improve text entities precision and accuracy, such as pre-processing and data clustering. The results showed that the F-scores of the output tested on the unannotated news dataset were between 30% and 60%... 590 hits |
| 2 | 2018 Thesis | Proper noun detection using regex algorithm and rules for malay named entity recognition Farid Morsidi This study was aimed to develop a Malay proper noun detection method to cluster and
classify named entity categories, particularly for major important classes such as
person, location, organization, and miscellaneous for Malay newspaper corpus. Regular
Expression pattern identification (regex) algorithm and rule were introduced in this study to
overcome the limitation of dictionary and gazetteer. Two visualization techniques namely as
Decision Tree and Term Document Matrix had been used to evaluate the efficiency of the
method. The result obtained 74% of accuracy during the generation of decision tree.
Visualization for term document matrix achieves a maximized value of 9.8007403, 9.8718517, and
9.9890683 for Astro Awani, Berita Harian, and Bernama dataset respectively. As a conclusion, the
regex algorithm could indicate the presence of Malay proper noun, thus making it an appropri..... 1084 hits |