UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :article
Subject :Q Science (General)
ISSN :2772-9419
Main Author :Riswan Efendi
Title :Cleansing of inconsistent sample in linear regression model based on rough sets theory
Place of Production :Tanjung Malim
Publisher :Fakulti Sains dan Matematik
Year of Publication :2023
Notes :Systems and Soft Computing
Corporate Name :Universiti Pendidikan Sultan Idris
HTTP Link :Click to view web link

Abstract : Universiti Pendidikan Sultan Idris
The linear regression model is one of the most common and easiest algorithms used in machine learning for predictive analysis purposes. However, this model performs well under strict assumptions such as the number of observations, the linearity of variables, multicollinearity, homoskedasticity, reliability of measurement, and normality. Besides, there is no consideration to date for handling and cleansing inconsistent samples in the data sets. These samples may significantly influence the performance of multiple linear regression in terms of these assumptions and several aspects, such as adjusted R-square, intercept-slopes, exogenous variables, and the accuracy of prediction. In this paper, the data reduction strategy of rough sets was employed to remove and clean these types of samples, boosting the performance of the linear regression models. This strategy was evaluated by examining three different effects; adjusted R-square, slopes-intercepts, and mean square error of the regression model. Simulated data and simple modeling problems were used to determine the effects of these three aspects. The secondary data sets were collected from various domains to examine the proposed rough-regression model. The simulation results showed that the data reduction strategy is exceedingly effective to boost the performance of the multiple linear regression in the three aspects above. In the implementation, these aspects also performed better than before data reduction. The results from both simulations and implementations demonstrate that the data reduction of rough sets is a viable strategy in cleansing of the inconsistent samples in the linear regression models. Thus, the proposed rough regression model is effectively proven to support the data analysis of surveys or cross-sectional studies, especially when the stated aspects are not well fulfilled. Therefore, the surveys are not needed to be repeated and reconsidered by researchers. 2022

References

Iqbal, M.A., 2020. Application of regression techniques with their advantages and disadvantages, pp.11–17.

B. Gu, R. Wang, An empirical study on the integration of precision poverty alleviation tracking audit and performance audit-multiple logistic regression analysis based on Daan County, Jilin Province, in: Proceeding ICMEIM-IEEE Xplore, 2020, pp. 543–546. 

Liu. Lei, G. Yao, Regression analysis of education and social conflict willingness of migrant workers under the social and economic form, in: Proceeding ICEMME-IEEE Xplore, 2020, pp. 661–664.

B. Sravani, M.M. Bala, Prediction of student performance using linear regression, Belgaum, India, in: Proceeding INCET-IEEE Xplore, 2020, pp. 1–5.

Olsen, A.A., McLaughin, J.E., Harpe, S.E., Using multiple linear regression in pharmacy education scholarship, United States, 12 (2020) 1258–1268.

I. Young, L.A. Waddell, B.J. Wilhelm, J. Greig, A systematic review and metaregression of single group, pre-post studies evaluating food safety education and training interventions for food handlers, Canada (Tor) 128 (2020), 108711.

Ho, I., M., K., Cheong, K.Y., and Weldon, A., 2021. Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques, pp. 1–27.

H. Liu, Affecting factors analysis on second language learning based on linear regression, in: Proceeding 2nd ICAIE-IEEE Xplore, 2021, pp. 97–100.

Powdthavee, N., Education and pro-environmental attitudes and behaviours: a nonparametric regression discontinuity analysis of a major schooling reform in England and Wales, UK, 181 (2021) 106931.

N.M. Minhas, K. Petersen, J. Borstler, K. Wnuk, Regression testing for large-scale embedded software development-exploring the state of practice, Sweden 120 (2020), 106254.

Y. Alqasrawi, M. Azzeh, Y. Elsheikh, Locally weighted regression with different kernel smoothers for software effort estimation, Jordan 214 (2022), 102744.

Saravanan, P., Sangeetha, V., African buffalo optimized multinomial softmax regression based convolutional deep neural network for software fault prediction, India, Proceeding 61 (2022) 619–626.

Piracha, M., Tani, M., Zimmermann, K.F., Zang, Y., Higher education expansion and the rise of China in economic research, 74 (2022) 101813.

H. Abbasimehr, F.S. Baghery, A novel time series clustering method with finetuned support vector regression for customer behavior analysis, Iran 204 (2022), 117584.

H. Fan, B. Han, W. Gao, (Im)Balanced customer-oriented behaviors and AI chatbots’ efficiency-flexibility performance: the moderating role of customers rational choices, China 66 (2022), 102937.

Japutra, A., Molinillo, S., Utami, A.F., Ekaputra, I.A., Exploring the effect of relative adventage and challenge on customer engagement behavior with mobile commerce applications, 72 (2022) 101841.

Thanh, T.V., Nguyen, N.P., Ngo, L.P.T., Vu, T.V., Nguyen, D.V., Sueur, I., Handling counterproductive behavior caused by customer misbehavior during a pandemic: integrating personal and organizational perspectives, 107 (2022) 103335.

M. Valentini, G. Santos, d. B, Vieira S, M. B, Multiple linear regression analysis (MLR) applied for modeling a new WQI equation for monitoring the water quality of Mirim Lagoon, in the state of Rio Grande do Sul—Brazi, l (2020) 1–11.

Smedegård, O.Ø., Jonsson, T., Aas, B., Stene, J., Georges, L., and Carlucci, S., 2021. The Implementation of Multiple Linear Regression for Swimming Pool Facilities: Case Study at Jøa, Norway, pp. 1–23.

P. Schober, R.T. Vetter, Logistic regression in medical research 132 (2021) 365–366.

F. Furaiji, M. Łatuszynska, ´ A. Wawrzyniak, An Empirical study of the factors influencing consumer behaviour in the electric appliances market 6 (2012) 76–87.

Sen, J., 2022. Machine Learning-Algorithms, Models and Applications, IntechOpen, pp. 1–47.

Nazirun, N., N., N., Omar, N., Selvaganeson, K., and Wahab, A.A., 2022. A review on machine learning approaches in COVID-19 pandemic prediction and forecasting, pp. 78–84.

Pawlak, Z., 1982. Rough sets, 11(5), pp.341–356.

J. Guo, W. Xiong, Hu. Chen. Application of Rough Set Theory to Multi-factor Medium and Long-period Runoff Prediction in Danjing Kou Reservoir. Sixth International Conference on Fuzzy System and Knowledge Discovery, IEEE, 2009.

J. Shyng, F. Wang, Rough set theory in analyzing the attributes of combination values for the insurance market 32 (2007) 56–58.

S. Ramanna, A. Skowron, J.F. Peters, Approximation space-based socio-technical conflict model, in: International Conference on Rough Sets and Knowledge Technology, Springer, Berlin, Heidelberg, 2007, pp. 476–483.

S. Mahapatra, Mahapatra, Attribute selection in marketing : A rough set approach, IIMB Management Review 22 (2010) 16–24.

A. Skowron, A. Jankowski, S. Dutta, Correction to: interactive granular computing 4 (2018) No.759.

P. Kotler, J.A. Caslione, How marketers can respond to recession and turbulence, Journal of Customer Behaviour 8 (2) (2009) 187–191.

W. Qi, X. Luo, X. Liu, Discrete choice model of customer behavior and empirical study, in: 2016 Chinese Control and Decision Conference (CCDC), IEEE, 2016, May, pp. 5677–5682. 


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.