UPSI Digital Repository (UDRep)
|
|
|
Abstract : |
Principal Component Analysis (PCA) is a popular method used for reduction of large scale data sets in hydrological applications. Typically, PCA scores are applied to hierarchical cluster analysis to redefine region. However, the choice of cumulative percentage of variance for PCA scores and identifying the best number of clusters can be difficult. In this paper, we investigate the effect of determining the number of clusters by comparing (i) standardized and unstandardized PCA scores on different cumulative percentages of variance (ii) to determine number of clusters using Calinski and Harabasz Index. We have found that Calinski and Harabasz Index is most appropriate to determine the best number of clusters and that standardized PCA scores within the range of 65% to 70% cumulative percentage of variance give the most reasonable number of clusters. |
References |
1. Lattin, J. (2003) Analyzing Multivariate Data. Cited 692 times. Curt Hinrichs, Canada 2. Romero, R., Ramis, C., Guijarro, J.A. Daily rainfall patterns in the Spanish Mediterranean area: An objective classification (1999) International Journal of Climatology, 19 (1), pp. 95-112. Cited 60 times. doi: 10.1002/(SICI)1097-0088(199901)19:1<95::AID-JOC344>3.0.CO;2-S 3. Alvin, C.R. (2002) Methods of Multivariate Analysis. Cited 56 times. Wiley, Hoboken 4. Cliff, N. (1987) Analyzing Multivariate Data. Cited 268 times. Harcourt Brace, San Diego 5. Cattell, R.B. The scree test for the number of factors (1966) Multivariate Behavioral Research, 1 (2), pp. 245-276. Cited 6731 times. doi: 10.1207/s15327906mbr0102_10 6. Jollieffe, I.T. Discarding variables in principal component analysis. I: Artifical data (1972) Appl. Stat., 21, pp. 160-173. Cited 495 times. 7. Mimmack, G.M., Mason, S.J., Galpin, J.S. Choice of distance matrices in cluster analysis: Defining regions (2001) Journal of Climate, 14 (12), pp. 2790-2797. Cited 49 times. http://journals.ametsoc.org/loi/ climhttp://journals.ametsoc.org/loi/clim doi: 10.1175/1520-0442(2001)014<2790:CODMIC>2.0.CO;2 8. Suhaila,J.,Deni,S.M.,Zawiah Zin,W.A.N.,Jemain, A.A. Trends in Peninsular Malaysia rainfall data during the southwest monsoon and northeast monsoon seasons: 1975-2004 (2010) Sains Malaysiana, 39 (4), pp. 533-542. Cited 63 times. 9. Wilks, D.S. (1995) Statistical Methods in the Atmospheric Sciences, P. 467. Cited 6688 times. Academic Press, Cambridge 10. Aldenderfer, M.S., Blashfield, R.K. (1984) Cluster Analysis. Cited 2535 times. Sage Publications, Inc., Beverly Hills 11. Mielke Jr, P.W. Geometric concerns pertaining to applications of statistical tests in the atmospheric sciences. (1985) Journal of the Atmospheric Sciences, 42 (12), pp. 1209-1212. Cited 18 times. doi: 10.1175/1520-0469(1985)042<1209:GCPTAO>2.0.CO;2 12. Bunkers, M.J., Miller Jr., J.R., DeGaetano, A.T. Definition of climate regions in the northern plains using an objective cluster modification technique (Open Access) (1996) Journal of Climate, 9 (1), pp. 130-146. Cited 47 times. http://journals.ametsoc.org/loi/climhttp://journals.ametsoc.org/loi/clim doi: 10.1175/1520-0442(1996)009<0130:DOCRIT>2.0.CO;2 13. Jolliffe, I.T. (1986) Principal Component Analysis. Springer Series in Statistics, P. 271. Cited 20038 times. Springer, Heidelberg 14. Chang, Wei-Chien ON USING PRINCIPAL COMPONENTS BEFORE SEPARATING A MIXTURE OF TWO MULTIVARIATE NORMAL DISTRIBUTIONS. (1983) Applied Statistics, 32 (3), pp. 267-275. Cited 163 times. 15. Pelczer, I.J., Cisnerous-Iturbe, H.L. Identification of rainfall patterns over the valley Mexico (2008) 11Th International Conference on Urban Drainage, pp. 1-9. Cited 4 times. 16. Fovell, R.G., Fovell, M.-Y.C. Climate zones of the conterminous United States defined using cluster analysis (Open Access) (1993) Journal of Climate, 6 (11), pp. 2103-2135. Cited 240 times. doi: 10.1175/1520-0442(1993)006<2103:CZOTCU>2.0.CO;2 17. Jackson, D.A. Stopping rules in principal components analysis: A comparison of heuristical and statistical approaches (1993) Ecology, 74 (8), pp. 2204-2214. Cited 1241 times. http://esajournals.onlinelibrary.wiley.com/hub/journal/10.1002/%28ISSN%291939-9170/issues/ doi: 10.2307/1939574 18. Grossman, G.D., Nickerson, D.M., Freeman, M.C. Principal component analyses of assemblage structure data: utility of tests based on eigenvalues (1991) Ecology, 72 (1), pp. 341-347. Cited 57 times. doi: 10.2307/1938927 19. Rexstad, E.A., Miller, D.D., Flather, C.H., Anderson, E.M., Hupp, J.W., Anderson, D.R. Questionable multivariate statistical inference in wildlife habitat and community studies (1988) J. Wildl. Manag., 52, pp. 794-798. Cited 39 times. 20. Mercedesm, D., Michael, E.P., Scott, S. (2013) Defining Clusters of Related Industries 21. Baeriswyl, P.-A., Rebetez, M. Regionalization of precipitation in Switzerland by means of principal component analysis (1997) Theoretical and Applied Climatology, 58 (1-2), pp. 31-41. Cited 72 times. http://link.springer.de/link/service/journals/00704/ doi: 10.1007/BF00867430 22. Bunkers, M.J., Miller Jr., J.R., DeGaetano, A.T. Definition of climate regions in the northern plains using an objective cluster modification technique (Open Access) (1996) Journal of Climate, 9 (1), pp. 130-146. Cited 47 times. http://journals.ametsoc.org/loi/climhttp://journals.ametsoc.org/loi/clim doi: 10.1175/1520-0442(1996)009<0130:DOCRIT>2.0.CO;2 23. DeGaetano, A.T. Delineation of mesoscale climate zones in the northeastern United States using a novel approach to cluster analysis (1996) Journal of Climate, 9 (8), pp. 1765-1782. Cited 51 times. http://journals.ametsoc.org/loi/climhttp://journals.ametsoc.org/loi/clim doi: 10.1175/1520-0442(1996)009<1765:DOMCZI>2.0.CO;2 |
This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials. You may use the digitized material for private study, scholarship, or research. |