UPSI Digital Repository (UDRep)
Start | FAQ | About

QR Code Link :

Type :article
Subject :G Geography (General)
Main Author :Siti Mariana Che Mat Nor
Additional Authors :Shazlyn Milleana Shaharudin
Shuhaida Ismail
Nurul Hila Zainuddin
Tan, Mou Leong
Title :A comparative study of different imputation methods for daily rainfall data in East-Coast Peninsular Malaysia
Place of Production :Tanjong Malim
Publisher :Fakulti Sains dan Matematik
Year of Publication :2020
Corporate Name :Universiti Pendidikan Sultan Idris
PDF Full Text :Click to view PDF file

Abstract : Universiti Pendidikan Sultan Idris
Rainfall data are the most significant values in hydrology and climatology modelling. However, the datasets are prone to missing values due to various issues. This study aspires to impute the rainfall missing values by using various imputation method such as Replace by Mean, Nearest Neighbor, Random Forest, Non-linear Interactive Partial Least-Square (NIPALS) and Markov Chain Monte Carlo (MCMC). Daily rainfall datasets from 48 rainfall stations across east-coast Peninsular Malaysia were used in this study. The dataset were then fed into Multiple Linear Regression (MLR) model. The performance of abovementioned methods were evaluated using Root Mean Square Method (RMSE), Mean Absolute Error (MAE) and Nash-Sutcliffe Efficiency Coefficient (CE). The experimental results showed that RF coupled with MLR (RF-MLR) approach was attained as more fitting for satisfying the missing data in east-coast Peninsular Malaysia.  


[1] I. F. Kamaruzaman, et al., “A Comparison of Method for Treating Missing Daily Rainfall Data in Peninsular Malaysia,” Malaysian Journal of Fundamental and Applied Sciences, pp. 375-380, 2017.

[2] R. J. Little and D. B. Rubin, “Statistical Analysis with Missing Data,” Wiley-Interscience, ISBN 978-0471183860, 2002.

[3] S. Moritz, et al., “Comparison of Different Methods for Univariate Time Series Imputation in R,” Cologne University of Applied Sciences, pp. 1-20, October 015.

[4] S. M. Shaharudin et al., “Modified Singular Spectrum Analysis in Identifynig Rainfall Trend over Peninsular Malaysia,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 15, no. 1, pp. 283-293, July 2019.

[5] A. A. Jemain et al., “Penyurihan Ikhtisar Data Hujan,” Dewan Bahasa dan Pustaka, p. 213, 2015.

[6] V. Chow et al., “Applied Hydrology”, Tata Mc Graw Hill Book Company, ISBN 0-07-0101810-2, 1988.

[7] Paulhus, J. L. H., and M. A. Kohler, “Interpolation of missing precipitation records,” Monthly Weather Review, vol. 80, no. 8, pp. 129-133, August 1952.

[8] N. F. A. Radi et al., “Estimation of Missing Rainfall Data using Spatial Interpolation and Imputation Methods,” AIP Conference Proceedings, vol. 1642, no. 1, pp. 1-11, February 2015

[9] M.-T. Sattari et al., "Assessment of Different Methods for Estimation of Missing Data in Precipitation Studies,” Hydrology Research, vol. 48, no. 4, pp. 1032-1044, 2017.

[10] R. P. De Silva, et al., “A Comparison of Methods used in Estimating Missing Rainfall Data,” The Journal of Agricultural Sciences, vol. 3, pp. 101-108, January 2007.

[11] M. Kim et al., “Comparative Studies of Different imputation Methods for Recovering Streamflow Observation,” in Water, vol. 7, no. 12, pp. 6847-6860, December 2015.

[12] Di Piazza et al., “Comparative Analysis of Different Techniques for Spatial Interpolation of Rainfall Data to Create a Serially Complete Monthly Time Series of Precipiation for Sicily, Italy,” International Journal of Applied Earth Observation and Geoinformation, vol. 13, no. 3, pp. 396-408, June 2011.

[13] M. M. Hasan and B. F. W. Croke, “Filling Gaps in Daily Rainfall Data: A Statistical Approach,” 20th International Congress on Modelling and Simulation, Adelaide, Australia, pp. 380-386, December 2013.

[14] T. Makhuvha et al., “Patching Rainfall Data using Regression Methods,” Journal of Hydrology, vol. 198, no. 1-4, pp. 308-318, November 1997.

[15] Xia et al., “Forest Climatology Estimation of Missing Values for Bavaria, Germany,” Agricultural and Forest Meteorology, vol. 96, no. 1-3, pp. 131-144, August 1999.

[16] M. B. Aissia, et al., “Multivariate Missing Data in Hydrology,” Advances in Water Resources, vol. 110, pp. 299-309, December 2017.

[17] F. Tang and H. Ishwaran, “Random Forest Missing Data Algorithms,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 10, no. 6, pp. 363- 377, June 2017

[18] S. M. Shaharudin, et al., “An Efficient Method to Improve the Clustering Performance using Hybrid Robust Principal Component Analysis-Spectral Biclustering in Rainfall Patterns Identification,” International Journal of Artificial Intelligence (IJ-AI),” vol. 8, no. 3, pp. 237-243, September 2019.

[19] M. B. Aissia et al., “Multivariate Missing Data in Hydrology,” Advances in Water Resources, vol. 110, pp. 299-309, December 2017.

[20] L. R. Beard and A. J. Fredrich, “Hyrologic frequency analysis,” Hydrologic Engineering Methods for Water Resources Development, April 1975.

[21] L. Beretta and A. Santaniello, “Nearest Neighbor Imputation algorithms: a critical evaluation,” BMC Medical Informatics and Decision Making, vol. 16, pp. 197-208, November 2015.

[22] M. Amiri and R. Jensen, “Missing Data Imputation using Fuzzy-rough Methods,” Neurocomputing, vol. 205, pp. 152-164, September 2016.

[23] R. Pan et al., “Missing Data Imputation by K Nearest Neighbours based on Grey Relational Structure and Mutual Information,” Applied Intelligence, pp. 1-22, 2015.

[24] M. Aci et al., “A Hybrid Classification Method of ? Nearest Neighbor, Bayesian Methods and Genetic Algorithm,” Expert Systems with Applications, vol. 37, no. 7, pp. 5061-5067, July 2010.

[25] N. Suhaimi et al., “Markov Chain Monte Carlo Method for Handling Missing Data in Air Quality Datasets,” Malaysian Journal of Analytical Sciences, vol. 21, no. 3, pp. 552-559, June 2017.

[26] S. M. Shaharudin, et al., “Identification of Rainfall Patterns on Hydrological Simulation using Robust Principal Component Analysis,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),” vol. 11, no. 3, pp. 1162-1167, September 2018.

[27] L. Breiman, “Random Forest”, Machine Learning, vol. 45, pp. 5–32, 2001.

[28] M. A. Shafi et al., “A Hybrid of Multiple Linear Regression Clustering Model with Support Vector Machine for Colorectal Cancer Tumor Size Prediction,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10, no. 4, pp. 323-328, January 2019.

[29] M. A. I. Navid and N. H. Niloy, “Multiple Linear Regression for Predicting Rainfall for Bangladesh,” Communications, vol. 6, no. 1, pp. 1-4, 2018.

[30] T. Chai and R. R. Draxler, “Root Mean Square Error (RMSE) or Mean Absolute Error (MAE),” Geoscientific Model Development, vol. 7, no. 1, pp. 1247-1250, January 2014.


This material may be protected under Copyright Act which governs the making of photocopies or reproductions of copyrighted materials.
You may use the digitized material for private study, scholarship, or research.

Back to previous page

Installed and configured by Bahagian Automasi, Perpustakaan Tuanku Bainun, Universiti Pendidikan Sultan Idris
If you have enquiries with this repository, kindly contact us at or Whatsapp +60163630263 (Office hours only)