Abstract
Groundwater contamination represents one of the most persistent environmental crises of the modern era, threatening the drinking water security of billions worldwide. Traditional monitoring approaches rely on sparse sampling networks and conventional statistical models that fail to capture the spatiotemporal complexity inherent in large-scale aquifer systems. This study presents a hybrid deep learning (DL) framework integrating long short-term memory (LSTM) networks, convolutional neural networks (CNNs), and autoencoders (AEs) for detecting hidden contamination patterns in national groundwater systems. Using multi-source hydrochemical datasets comprising over 120,000 well records from five major aquifer regions, the proposed model achieved a root mean square error (RMSE) of 0.183 mg/L for nitrate prediction and an area under the curve (AUC) of 0.941 for anomaly classification. The framework successfully identified spatially clustered contamination hotspots previously undetected by conventional monitoring programs, including sub-threshold arsenic plumes in agricultural transition zones. Results demonstrate that DL-based approaches outperform support vector machines (SVMs), random forests (RFs), and traditional artificial neural networks (ANNs) by 15–27% across all evaluated metrics. These findings offer actionable insights for national water authorities seeking to transition toward data-driven, predictive groundwater governance.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright (c) 2026 Carlos Mendoza, Hanna Virtanen (Author)