UPSI Digital Repository (UDRep)
Start | FAQ | About
Menu Icon

QR Code Link :

Type :article
Subject :T Technology
ISSN :2331625X
Main Author :Shazlyn Milleana Shaharudin
Additional Authors :Nor Siti Mariana Che Mat
Title :A RPCA-Based Tukey\'s biweight for clustering identification on extreme rainfall data
Place of Production :Tanjung Malim
Publisher :Fakulti Sains dan Matematik
Year of Publication :2021
Notes :Environment and Ecology Research
Corporate Name :Universiti Pendidikan Sultan Idris
HTTP Link :Click to view web link

Abstract : Universiti Pendidikan Sultan Idris
In high dimensional data, Principal Component Analysis (PCA)-based Pearson correlation remains broadly employed to reduce the data dimensions and to improve the effectiveness of the clustering partitions. Besides being prone to sensitivity on non-Gaussian distributed data, in a high dimensional data analysis, this algorithm may influence the partitions of cluster as well as generate exceptionally imbalanced clusters due to its assigned equal weight to each observation pairs. To solve the unbalanced clusters in hydrological study caused by skewed character of the dataset, this study came out with a robust method of PCA in term of the correlation. This study will explain a RPCA to be proposed as an alternative to classical PCA in reducing high dimensional dataset to a lower form as well as obtain balance clustering result. This study improved where RPCA managed to downweigh the far-from-center outliers and develop the cluster partitions. The results for both methods are compared in term of number of components and clusters obtained as well as the clustering validity. Regarding the internal and stability validation criteria, this study focuses on the cluster?s quality in order to validate the results of clusters obtained for both methods. From the findings, the amount of clusters had improved significantly by using RPCA compared to classical PCA. This proved that the proposed approach are outliers resistant than classical PCA as the proposed approach made a thorough observation assessment and downweigh the ones which were distant from the data center. ?2021 by authors, all rights reserved.

References

Bhalla, D. (2020). Validate Cluster Analysis, Retrieved from www.scopus.com

Brock, G., Pihur, V., Datta, S., & Datta, S. (2008). ClValid: An R package for cluster validation. Journal of Statistical Software, 25(4), 1-22. doi:10.18637/jss.v025.i04

Choulakian, V. (2001). Robust Q-mode principal component analysis in L1. Computational Statistics and Data Analysis, 37(2), 135-150. doi:10.1016/S0167-9473(01)00005-6

Cooley, D. (2013). Return periods and return levels under climate change. Extremes in a Changing Climate, , 97-114. Retrieved from www.scopus.com

Datta, S., & Datta, S. (2003). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics, 19(4), 459-466. doi:10.1093/bioinformatics/btg025

Indhumathi, R., & Sathiyabama, S. (2010). Reducing and Clustering High Dimensional Data through Principal Component Analysis, Retrieved from www.scopus.com

Jollife, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065) doi:10.1098/rsta.2015.0202

Jolliffe, I. T. (1986). Principal Component Analysis, Retrieved from www.scopus.com

Katz, R. W. (2013). Statistical methods for nonstationary extremes. Extremes in a Changing Climate, , 15-37. Retrieved from www.scopus.com

Lesnussa, Y. A., Melsasail, N. A., & Leleury, Z. A. (2016). Application of principal component analysis for face recognition based on weighting matrix using GUI matlab. Educ.JSMT, 3(2), 1-7. Retrieved from www.scopus.com

Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650-1654. doi:10.1109/TPAMI.2002.1114856

Milleana Shaharudin, S., Ahmad, N., & Yap, X. (2013). The Comparison of T-Mode and Pearson Correlation Matrices in Classfication of Daily Rainfall Patterns in Peninsular Malaysia, Retrieved from www.scopus.com

Mohamed, Z., & Rosli, R. (2014). Development of A structural model with multicollinearity and outliers problems. EDUCATUM Journal of Science, Mathematics and Technology, 1(1), 38-52. Retrieved from www.scopus.com

Neware, S., Mehta, K., & Zadgaonkar, A. S. (2013). Finger knuckle identification using principal component analysis and nearest mean classifier. International Journal of Computer Applications, 70(9), 18-23. Retrieved from www.scopus.com

Owen, M. (2010). Tukey's Biweight Correlation and the Breakdown, Retrieved from www.scopus.com

Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, B, 283-297. Retrieved from www.scopus.com

Shaharudin, S. M., & Ahmad, N. (2017). Choice of cumulative percentage in principal component analysis for regionalization of peninsular malaysia based on the rainfall amount doi:10.1007/978-981-10-6502-6_19 Retrieved from www.scopus.com

Shaharudin, S. M., Ahmad, N., Mohamed, N. S., & Aziz, N. (2020). Performance analysis and validation of modified singular spectrum analysis based on simulation torrential rainfall data. International Journal on Advanced Science, Engineering and Information Technology, 10(4), 1450-1456. doi:10.18517/ijaseit.10.4.11653

Shaharudin, S. M., Ahmad, N., & Nor, S. M. C. M. (2020). A modified correlation in principal component analysis for torrential rainfall patterns identification. IAES International Journal of Artificial Intelligence, 9(4), 655-661. doi:10.11591/ijai.v9.i4.pp655-661

Shazlyn Milleana, S., Ismail, S., Siti Mariana, C. M. N., & Norhaiza, A. (2019). An efficient method to improve the clustering performance using hybrid robust principal component analysis-spectral biclustering in rainfall patterns identification. IAES International Journal of Artificial Intelligence, 8(3), 237-243. doi:10.11591/ijai.v8.i3.pp237-243

Solimun, Fernandes, A. A. R., & Cahyoningtyas, R. A. (2020). The implementation of nonlinear principal component analysis to acquire the demography of latent variable data (A study case on brawijaya university students). Mathematics and Statistics, 8(4), 437-442. doi:10.13189/ms.2020.080410

Sveinsson, O. G. B., Salas, J. D., & Boes, D. C. (2005). Prediction of extreme events in hydrologic processes that exhibit abrupt shifting patterns. Journal of Hydrologic Engineering, 10(4), 315-326. doi:10.1061/(ASCE)1084-0699(2005)10:4(315)

Villarini, G., Smith, J. A., & Napolitano, F. (2010). Nonstationary modeling of a long record of rainfall and temperature over rome. Advances in Water Resources, 33(10), 1256-1267. doi:10.1016/j.advwatres.2010.03.013


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries, kindly contact us at pustakasys@upsi.edu.my or 016-3630263. Office hours only.