UPSI Digital Repository (UDRep)
|
|
|
Abstract : Universiti Pendidikan Sultan Idris |
In high dimensional data, Principal Component Analysis (PCA)-based Pearson correlation remains broadly employed to reduce the data dimensions and to improve the effectiveness of the clustering partitions. Besides being prone to sensitivity on non-Gaussian distributed data, in a high dimensional data analysis, this algorithm may influence the partitions of cluster as well as generate exceptionally imbalanced clusters due to its assigned equal weight to each observation pairs. To solve the unbalanced clusters in hydrological study caused by skewed character of the dataset, this study came out with a robust method of PCA in term of the correlation. This study will explain a RPCA to be proposed as an alternative to classical PCA in reducing high dimensional dataset to a lower form as well as obtain balance clustering result. This study improved where RPCA managed to downweigh the far-from-center outliers and develop the cluster partitions. The results for both methods are compared in term of number of components and clusters obtained as well as the clustering validity. Regarding the internal and stability validation criteria, this study focuses on the cluster?s quality in order to validate the results of clusters obtained for both methods. From the findings, the amount of clusters had improved significantly by using RPCA compared to classical PCA. This proved that the proposed approach are outliers resistant than classical PCA as the proposed approach made a thorough observation assessment and downweigh the ones which were distant from the data center. ?2021 by authors, all rights reserved. |
References |
Bhalla, D. (2020). Validate Cluster Analysis, Retrieved from www.scopus.com Brock, G., Pihur, V., Datta, S., & Datta, S. (2008). ClValid: An R package for cluster validation. Journal of Statistical Software, 25(4), 1-22. doi:10.18637/jss.v025.i04 Choulakian, V. (2001). Robust Q-mode principal component analysis in L1. Computational Statistics and Data Analysis, 37(2), 135-150. doi:10.1016/S0167-9473(01)00005-6 Cooley, D. (2013). Return periods and return levels under climate change. Extremes in a Changing Climate, , 97-114. Retrieved from www.scopus.com Datta, S., & Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19(4), 459-466. doi:10.1093/bioinformatics/btg025 Indhumathi, R., & Sathiyabama, S. (2010). Reducing and Clustering High Dimensional Data through Principal Component Analysis, Retrieved from www.scopus.com Jollife, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065) doi:10.1098/rsta.2015.0202 Jolliffe, I. T. (1986). Principal Component Analysis, Retrieved from www.scopus.com Katz, R. W. (2013). Statistical methods for nonstationary extremes. Extremes in a Changing Climate, , 15-37. Retrieved from www.scopus.com Lesnussa, Y. A., Melsasail, N. A., & Leleury, Z. A. (2016). Application of principal component analysis for face recognition based on weighting matrix using GUI matlab. Educ.JSMT, 3(2), 1-7. Retrieved from www.scopus.com Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650-1654. doi:10.1109/TPAMI.2002.1114856 Milleana Shaharudin, S., Ahmad, N., & Yap, X. (2013). The Comparison of T-Mode and Pearson Correlation Matrices in Classfication of Daily Rainfall Patterns in Peninsular Malaysia, Retrieved from www.scopus.com Mohamed, Z., & Rosli, R. (2014). Development of A structural model with multicollinearity and outliers problems. EDUCATUM Journal of Science, Mathematics and Technology, 1(1), 38-52. Retrieved from www.scopus.com Neware, S., Mehta, K., & Zadgaonkar, A. S. (2013). Finger knuckle identification using principal component analysis and nearest mean classifier. International Journal of Computer Applications, 70(9), 18-23. Retrieved from www.scopus.com Owen, M. (2010). Tukey's Biweight Correlation and the Breakdown, Retrieved from www.scopus.com Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, B, 283-297. Retrieved from www.scopus.com Shaharudin, S. M., & Ahmad, N. (2017). Choice of cumulative percentage in principal component analysis for regionalization of peninsular malaysia based on the rainfall amount doi:10.1007/978-981-10-6502-6_19 Retrieved from www.scopus.com Shaharudin, S. M., Ahmad, N., Mohamed, N. S., & Aziz, N. (2020). Performance analysis and validation of modified singular spectrum analysis based on simulation torrential rainfall data. International Journal on Advanced Science, Engineering and Information Technology, 10(4), 1450-1456. doi:10.18517/ijaseit.10.4.11653 Shaharudin, S. M., Ahmad, N., & Nor, S. M. C. M. (2020). A modified correlation in principal component analysis for torrential rainfall patterns identification. IAES International Journal of Artificial Intelligence, 9(4), 655-661. doi:10.11591/ijai.v9.i4.pp655-661 Shazlyn Milleana, S., Ismail, S., Siti Mariana, C. M. N., & Norhaiza, A. (2019). An efficient method to improve the clustering performance using hybrid robust principal component analysis-spectral biclustering in rainfall patterns identification. IAES International Journal of Artificial Intelligence, 8(3), 237-243. doi:10.11591/ijai.v8.i3.pp237-243 Solimun, Fernandes, A. A. R., & Cahyoningtyas, R. A. (2020). The implementation of nonlinear principal component analysis to acquire the demography of latent variable data (A study case on brawijaya university students). Mathematics and Statistics, 8(4), 437-442. doi:10.13189/ms.2020.080410 Sveinsson, O. G. B., Salas, J. D., & Boes, D. C. (2005). Prediction of extreme events in hydrologic processes that exhibit abrupt shifting patterns. Journal of Hydrologic Engineering, 10(4), 315-326. doi:10.1061/(ASCE)1084-0699(2005)10:4(315) Villarini, G., Smith, J. A., & Napolitano, F. (2010). Nonstationary modeling of a long record of rainfall and temperature over rome. Advances in Water Resources, 33(10), 1256-1267. doi:10.1016/j.advwatres.2010.03.013 |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |