UPSI Digital Repository (UDRep)
|
|
|
Abstract : Universiti Pendidikan Sultan Idris |
This paper presents a modified correlation in principal component analysis (PCA) for selection number of clusters in identifying rainfall patterns. The approach of a clustering as guided by PCA is extensively employed in data with high dimension especially in identifying the spatial distribution patterns of daily torrential rainfall. Typically, a common method of identifying rainfall patterns for climatological investigation employed T mode-based Pearson correlation matrix to extract the relative variance retained. However, the data of rainfall in Peninsular Malaysia involved skewed observations in the direction of higher values with pure tendencies of values that are positive. Therefore, using Pearson correlation which was basing on PCA on rainfall set of data has the potentioal to influence the partitions of cluster as well as producing exceptionally clusters that are eneven in a space with high dimension. For current research, to resolve the unbalanced clusters challenge regarding the patterns of rainfall caused by the skewed character of the data, a robust dimension reduction method in PCA was employed. Thus, it led to the introduction of a robust measure in PCA with Tukey’s biweight correlation to downweigh observations along with the optimal breakdown point to obtain PCA’s quantity of components. Outcomes of this study displayed a highly substantial progress for the robust PCA, contrasting with the PCA-based Pearson correlation in respects to the average amount of acquired clusters and indicated 70% variance cumulative percentage at the breakdown point of 0.4. |
References |
[1] I. T. Jollife and J. Cadima, “Principal Component Analysis: A review and recent developments”, Philosophical Transactions Society, 374, 2016. [2] S. M. Shaharudin et al., “Fitting statistical distribution of extreme rainfall data for the purpose of simulation”, Indonesian Journal of Electrical Engineering and Computer Science, 18(3), pp 1367-1374, 2019.] [3] S. Dan’azumi, S. Shamsudin, A. Aris, “Probability Distribution of Rainfall Depth at Hourly Time-Scale”, World Academy of Science, Engineering and Technology. 4(12), pp. 670-674, 2010. [4] A. A. Jemain and J. Suhaila, “Fitting the Statistical Distribution for Daily Rainfall in Peninsular Malaysia based on AIC criterion”, Journal of Applied Science Research. 4(12), pp. 1846-1857, 2008. [5] S. Yue, M. Hashino, “Probability Distribution of Annual, Seasonal and Monthly Precipitation in Japan”, Hydrological Science Journal. 52(5), pp. 863-877, 2007. [6] S. M. Shaharudin, “Spatial and temporal torrential rainfall guided cluster pattern based on dimension reduction method”, Thesis, 2017. [7] S. M. Shaharudin et al., “An efficient method to improve the clustering performance usin hybrid robust principal component analysis-spectral biclustering in rainfall patterns identification”, IAES International Journal of Artificial Intelligence (IJ-AI), 8(3), pp. 237-243, 2019. [8] S. M. Shaharudin et al., “The comparison of T-Mode and pearson correlation matrices in classification of daily rainfall patterns in Peninsular Malaysia”, Malaysian Journal of Industrial and Applied Mathematics, EISSN 0127-9602, vol. 29, 2013. [9] S. M. Shaharudin and N. Ahmad, “Improved Cluster Partition in Principal Component Analysis Guided Clustering”, International Journal of Computer Applications, 75(11), pp. 1162-1167. 2013. [10] M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, New York, 1958. [11] S. M. Shaharudin et al., “Identification of rainfall patterns on hydrological simulation using Robust Principal Component Analysis”, Indonesian Journal of Electrical Engineering and Computer Science, 11(3), pp. 1162-1167, 2018. [12] S. M. Shaharudin and N. Ahmad, “Cgoice of cumulative percentage in principal component analysis for regionalization of Peninsular Malaysia based on the rainfall amount”, Asian Simulation Conference (AsiaSim 2017), Modelling, Design and Simulation of systems, pp. 216-224, 2017. [13] S. Neware et al., “Finger knuckle identification using principal component analysis and nearest mean classifier”, International Journal of Computer Applications, 70(9), 2013. [14] S. M. Shaharudin and N. Ahmad, “Modeling, design and simulation systems”, CCIS, 752: 216-224, Springer, Singapore. doi: 10.1007/978-981-10-6502-6_19. 2017. [15] J. Hardin, “A robust Measure of correlation between two genes on a microarray”, BMC Bioinformatics. 8(220). 2007. [16] P. J. Rousseeuw and A. M. Leroy, “Robust regression and outlier detection”, New Jersey: John Wiley & Sons, Inc. 2003. [17] M. Owen, “Tukey's biweight correlation and the breakdown”, Thesis, Pomona College, 2010. [18] V. Choulakian, “Robust q-mode principal component analysis in L1”, Computational Statistics & Data Analysis, 37, pp. 135-150, 2001. [19] U. Maulik, “Performance evaluation of someclustering algorithms and validity indices”, IEEE Transactions of Pattern Analysis and Machine Intelligence, 24(12), 2002. [20] G. M. Mimmack et al., “Choice of distance matrices in cluster analysis: defining regions”, Journal of Climate, 14, pp. 2790-2797, 2002. [21] M. S. Barrera et al., “PCA based on multivariate MM-estimators with fast and robust bootstrap”, Journal of American Statistical Association, 101, pp. 1981-1211, 2006. [22] J. A. Awan et at., “Identification and trend analysis of homogeneous rainfall zones over the East Asia monsoon region”, International Journal of Climatology, 35, pp. 1422-1433, 2015. [23] B. S. Everitt et al., Cluster Analysis, London: Arnold Publisher, 2001.
|
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |