UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :article
Subject :Q Science (General)
ISSN :0094243X
Main Author :Harnani Mat Zin
Additional Authors :Norwati Mustapha
Masrah Azrifah Azmi Murad
Nurfadhlina Mohd Sharef
Title :The effects of pre-processing strategies in sentiment analysis of online movie reviews
Year of Publication :2017

Abstract :
With the ever increasing of internet applications and social networking sites, people nowadays can easily express their feelings towards any products and services. These online reviews act as an important source for further analysis and improved decision making. These reviews are mostly unstructured by nature and thus, need processing like sentiment analysis and classification to provide a meaningful information for future uses. In text analysis tasks, the appropriate selection of words/features will have a huge impact on the effectiveness of the classifier. Thus, this paper explores the effect of the pre-processing strategies in the sentiment analysis of online movie reviews. In this paper, supervised machine learning method was used to classify the reviews. The support vector machine (SVM) with linear and non-linear kernel has been considered as classifier for the classification of the reviews. The performance of the classifier is critically examined based on the results of precision, recall, f-measure, and accuracy. Two different features representations were used which are term frequency and term frequency-inverse document frequency. Results show that the pre-processing strategies give a significant impact on the classification process.

References

1. B. Liu, “Sentiment Analysis and Opinion Mining,” Synth. Lect. Hum. Lang. Technol., vol. 5, no. 1, pp. 1–167, May 2012. 2. F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surv., vol. 34, no. 1, pp. 1–47, 2002. 3. C. Lee and G. G. Lee, “Information gain and divergence-based feature selection for machine learning-based text categorization,” Inf. Process. Manag., vol. 42, no. 1 SPEC. ISS, pp. 155–165, 2006. 4. %� 3DQJ� /� /HH� +� 5G� DQG 6� -RVH� ³7KXPEV XSௗ" 6HQWLPHQW &lassification using Machine Learning Techniques,” no. July, pp. 79–86, 2002. 5. B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales,” Proc. 43rd Annu. Meet. Assoc. Comput. Linguist., vol. 3, no. 1, pp. 115–124, 2005. 6. A. Abbasi, S. France, Z. Zhang, and H. Chen, “Selecting attributes for sentiment classification using feature relation networks,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 3, pp. 447–462, 2011. 7. E. Riloff, E. Riloff, J. Wiebe, and J. Wiebe, “Learning extraction patterns for subjective expressions,” in Proceedings of the 2003 conference on Empirical methods in natural language processing -, 2003, pp. 105–112. 8. S. Kim and E. Hovy, “Extracting Opinions , Opinion Holders , and Topics Expressed in Online News Media Text,” in ACL Workshop on Sentiment and Subjectivity in Text, 2006, pp. 1–8. 9. H. Yu and V. Hatzivassiloglou, “Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences,” Proc. 2003 Conf. Empir. Methods Nat. Lang. Process., pp. 129– 136, 2003. 10. L. K. W. Tan, J. C. Na, Y. L. Theng, and K. Chang, “Sentence-level sentiment polarity classification using a linguistic approach,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7008 LNCS, pp. 77–87, 2011. 11. M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004. 12. B. Lu, M. Ott, C. Cardie, and B. K. Tsou, “Multi-aspect sentiment analysis with topic models,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2011, pp. 81–88. 13. G. Di Fabbrizio, A. Aker, and R. Gaizauskas, “STARLET: Multi-document Summarization of Service and Product Reviews with Balanced Rating Distributions,” in 11th IEEE International Conference on Data Mining Workshops, 2011, pp. 67–74. 14. T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” Proc. Conf. Hum. Lang. Technol. Empir. Methods Nat. Lang. Process. - HLT ’05, pp. 347–354, 2005. 15. E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,” in Procedia Computer Science, 2013, vol. 17, pp. 26–32. 16. R. Duwairi and M. El-orfali, “A Study of the Effects of Preprocessing Strategies on Sentiment Analysis for Arabic Text,” J. Inf. Sci., pp. 1–13, 2013. 17. K. V. Ghag and K. Shah, “Comparative analysis of effect of stopwords removal on sentiment classification,” in IEEE International Conference on Computer Communication and Control (IC4-2015), 2015, pp. 2–7. 18. Y. Bao, C. Quan, L. Wang, and F. Ren, “The Role of Pre-processing in Twitter Sentiment Analysis,” in Intelligent Computing Methodologies, 2014, pp. 615–624. 19. J. Zhao, “Pre-processing Boosting Twitter Sentiment Analysis?,” in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom together with DataCom 2015 and SC2 2015, 2015, pp. 748–753. 20. J. Zhao and X. Gui, “Comparison Research on Text Pre-processing Methods on Twitter Sentiment Analysis,” IEEE Access, no. 99, 2017. 21. D. A. Allotey, “Sentiment Analysis and Classification of Online Reviews Using,” 2011. 22. C. B. Gerard Salton, “Term-weighting Approaches in Automatic Text Retrieval,” Inf. Process. Manag., vol. 24, no. 5, pp. 513–523, 1988. 23. C. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval. 2009.


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.