Environmental Science and Pollution Research, Volume 31, Issue 41, Pages 54044-54060 , 01/09/2024

Imputation of missing monthly rainfall data using machine learning and spatial interpolation approaches in Thale Sap Songkhla River Basin, Thailand

Sirimon Pinthong, Pakorn Ditthakit, Nureehan Salaeh, Mohd Abul Hasan, Cao Truong Son, Nguyen Thi Thuy Linh, Saiful Islam, Krishna Kumar Yadav

Abstract

Missing rainfall data has been a prevalent issue and primarily interested in hydrology and meteorology. This research aimed to examine the capability of machine learning (ML) and spatial interpolation (SI) methods to estimate missing monthly rainfall data. Six ML algorithms (i.e. multiple linear regression (MLR), M5 model tree (M5), random forest (RF), support vector regression (SVR), multilayer perceptron (MLP), genetic programming (GP)) and four SI methods (i.e. arithmetic average (AA), inverse distance weighting (IDW), correlation coefficient weighted (CCW), normal ratio (NR)) were investigated and compared in their performance. The twelve rainfall stations, located in the Thale Sap Songkhla river basin and nearby basins, were considered as a study case. Tuning hyper-parameters for each ML method was conducted to get the most suitable model for the data sets considered. Three performance criteria matrices (i.e. NSE, OI, and r) were chosen, and the sum of those three performance criteria matrices was introduced for methods’ performance comparison. The experimental results pointed out that selecting neighbouring stations were essential when applying SI methods, but not for the ML method. The overall performance showed ML better imputed missing monthly rainfall than SI due to overcoming spatial constraints. GP provided the highest performance by giving NSE = 0.825, OI = 0.877, and r = 0.909 for the training stage. Those values for the testing stage were 0.796, 0.852, and 0.902, respectively. It was followed by SVR-rbf, SVR-poly, and RF. NR provided the best performance among four SI methods, followed by CCW, AA, and IDW. When applying SI methods, it should contemplate a correlation between the target and neighbouring stations greater than 0.80.

Document Type

Article

Source Type

Journal

Keywords

Hyper-parametersImputationMachine learningMissing rainfall dataSpatial interpolation

ASJC Subject Area

Environmental Science : PollutionEnvironmental Science : Environmental ChemistryEnvironmental Science : Health, Toxicology and Mutagenesis

Funding Agency

King Khalid University


Bibliography


Pinthong, S., Ditthakit, P., Salaeh, N., Hasan, M., Son, C., Linh, N., Islam, S., ... Yadav, K. (2024). Imputation of missing monthly rainfall data using machine learning and spatial interpolation approaches in Thale Sap Songkhla River Basin, Thailand. Environmental Science and Pollution Research, 31(41) 54044-54060. doi:10.1007/s11356-022-23022-8

Copy | Save