Machine Learning-Based Water Pollution Prediction for Sustainable Environmental Monitoring

Authors

  • Dr. Shubham Tiwari

DOI:

https://doi.org/10.69980/pp6s2x97

Keywords:

Water Potability Prediction, Machine Learning, Environmental Monitoring

Abstract

Pollution of water and the decreasing supply of clean drinking water have now become one of greatest environmental and human health issues in the world. The conventional methods of water quality assessment are usually costly, time-consuming and ineffective in continuous environmental monitoring. Here, the current paper examines application of machine learning techniques in predicting water potability using physicochemical water quality indicators. The authors used a publicly available dataset of 3276 water samples with nine environmental factors, including conductivity, organic carbon, trihalomethanes, pH, hardness, solids, sulphate, chloramines, and turbidity. Several monitored ML algorithms, which included; LR, Decision Tree, Random Forest, Support Vector Machine and K-Nearest Neighbor, were comparatively tested in terms of predictive performance. Mean imputation, feature scaling, and exploratory data analysis methods were applied to the data to prepare it before the development of the model. The results showed that SVM had highest accuracy, whereas the Decision Tree model had the most balanced classification results in terms of the F1-score. The most influential predictors of water potability were found to be pH, sulfate, solids, and hardness through feature importance analysis. The findings emphasize the suitability of ML methods in assisting intelligent environmental surveillance, sustainable water resources, and systems of early contamination. The research has a contribution to the increasing role of artificial intelligence to the environment and offers meaningful implications to policy-makers, smart cities, and health agencies.

References

1. Abba, S. I., Hadi, S. J., & Abdullahi, J. (2017). River water modelling prediction using multi-linear regression, artificial neural network, and adaptive neuro-fuzzy inference system techniques. Procedia Computer Science, 120, 75–82.

2. Aditya Kadiwa. (2020). Water Quality. https://www.kaggle.com/datasets/adityakadiwal/water-potability

3. Banda, T. D., & Kumarasamy, M. (2024). Artificial neural network (ANN)-based water quality index (WQI) for assessing spatiotemporal trends in surface water quality—A case study of South African river basins. Water, 16(11), 1485.

4. Bui, D. T., Khosravi, K., Tiefenbacher, J., Nguyen, H., & Kazakis, N. (2020). Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of the Total Environment, 721, 137612.

5. Campos, D., Galvão, V., de Rezende, M. L., Braga, A., Bodini, M., Aires, U. R., Yonaba, R., & Goliatt, L. (2026). Automated machine learning achieves accurate water quality prediction with reduced parameter requirements. Scientific Reports. https://www.nature.com/articles/s41598-025-34448-8

6. Das, S., Khondakar, K. R., Mazumdar, H., Kaushik, A., & Mishra, Y. K. (2025). AI and IoT: Supported Sixth Generation Sensing for Water Quality Assessment to Empower Sustainable Ecosystems. ACS ES&T Water, 5(2), 490–510. https://doi.org/10.1021/acsestwater.4c00360

7. Hossain, M. M., Rahman, M. H., Rahman, M. A., & Ahmed, H. (2024). Machine Learning for Diagnosing Water Potability and Explainable AI for Contextual Insights. https://www.researchsquare.com/article/rs-4557533/latest

8. Hridoy, M. A. A. M., Pastorino, P., Bordin, C., Goliatt, L., Schneider, P., Shawkat, A. I., Rahman, M. S., Uddin, M., Bodini, M., & Ditthakit, P. (2026). Machine Learning Enhanced Prediction of TDS for Strengthening Aquatic Disease Early Warning Systems. https://www.researchsquare.com/article/rs-8199017/latest

9. Islam, F. S. (2025). A comprehensive analysis of air pollution in Dhaka City, Bangladesh, and the application of artificial intelligence and machine learning for enhanced management and forecasting. International Journal of Applied and Natural Sciences, 3(1), 131–167.

10. Kim, H., Quan, Y.-J., Jung, G., Lee, K.-W., Jeong, S., Yun, W.-J., Park, S., & Ahn, S.-H. (2023). Smart factory transformation using industry 4.0 toward ESG perspective: A critical review and future direction. International Journal of Precision Engineering and Manufacturing-Smart Technology, 1(2), 165–185.

11. Kumar, V., Alam, A., Kumar, J., Thakur, V. R., Kumar, V., Srivastava, S. K., Jha, D. N., & Das, B. K. (2024). Water Quality Assessment, Possible Pollution Source Identification from Anthropogenically Stressed River Yamuna, India using Hydrochemical, Water Quality Indices and Multivariate Statistics Analysis. Water, Air, & Soil Pollution, 235(12), 820. https://doi.org/10.1007/s11270-024-07649-6

12. Lukić Bilela, L., Matijošytė, I., Krutkevičius, J., Alexandrino, D. A., Safarik, I., Burlakovs, J., Gaudêncio, S. P., & Carvalho, M. F. (2023). Impact of per-and polyfluorinated alkyl substances (PFAS) on the marine environment. https://run.unl.pt/entities/publication/3cc3f6c1-7a02-4b71-8514-e74a926426c2

13. Miller, T., Durlik, I., Kostecka, E., & Kozlovska, P. (2025). Łobodzi nska, A.; Sokołowska, S.; Nowy, A. Integrating Artificial Intelligence Agents with the Internet of Things for Enhanced Environmental Monitoring: Applications in Water Quality and Climate Data. https://mlgp4climate.com/uploads/MLGP%20Library/Useful%20Documents/English/923.pdf

14. Mosavi, A., Sajedi Hosseini, F., Choubin, B., Taromideh, F., Ghodsi, M., Nazari, B., & Dineva, A. A. (2021). Susceptibility mapping of groundwater salinity using machine learning models. Environmental Science and Pollution Research, 28(9), 10804–10817. https://doi.org/10.1007/s11356-020-11319-5

15. Natarajan, V. (Ed.). (2025). Computational Artificial Intelligence and Methods for industries: A Machine-Generated Literature Overview. Springer Nature Singapore. https://doi.org/10.1007/978-981-96-5277-8

16. Punitha, A., Syedakbar, S., & Jeyasudha, S. (2026). Advanced Pathways in Electrical, Communication, and Automation: Reconfigurable Systems, Smart Energy, and AI for Industry 5.0. CRC Press. https://books.google.com/books?hl=en&lr=&id=W7HNEQAAQBAJ&oi=fnd&pg=PP13&dq=Reddy,+V.+K.,+Kumar,+P.,+%26+Eswar,+P.+(2025).+Water+potability+detection+using+machine+learning.+International+Journal+of+Advanced+Research+in+Computer+Science,+16(1),+45%E2%80%9358.&ots=Jc1IKJxOGV&sig=6yR7yUFhEDC1VXJ--t_NqpVHWWw

17. Rachid, E.-B., Abderrahim, S., Hafid, A., & Souad, R. (2025). Predicting water potability using a machine learning approach. Environmental Challenges, 19, 101131.

18. Rejini, K., Visumathi, J., & Genitha, C. H. (2025). Application of transformer-based deep learning models for predicting the suitability of water for agricultural purposes. Water, 17(9), 1347.

19. Sharma, K., Raizada, P., Hasija, V., Singh, P., Bajpai, A., Nguyen, V.-H., Rangabhashiyam, S., Kumar, P., Nadda, A. K., & Kim, S. Y. (2021). ZnS-based quantum dots as photocatalysts for water purification. Journal of Water Process Engineering, 43, 102217.

20. Sharmila, V., Kannadhasan, S., Kannan, A. R., Sivakumar, P., & Vennila, V. (2024). Challenges in Information, Communication and Computing Technology: Proceedings of the 2nd International Conference on Challenges in Information, Communication, and Computing Technology (ICCICCT 2024), April 26th & 27th, 2024, Namakkal, Tamil Nadu, India. CRC Press. https://books.google.com/books?hl=en&lr=&id=ORZEEQAAQBAJ&oi=fnd&pg=PP15&dq=Chowdary,+G.,+%26+Reddy,+V.+(2024).+Water+potability+prediction+using+ensemble+machine+learning+methods.+Journal+of+Environmental+Informatics,+39(2),+211%E2%80%93225.&ots=BDeQpNa7oC&sig=Cwyeg7kgk3M_IDQjZcZ2PlCXhy0

21. Subramaniyaswamy, V., Kshetri, N., Ravi, L., Revathy, G., & Thillaiarasu, N. (2025). Deep Learning and Blockchain Technology for Smart and Sustainable Cities. Auerbach Publications. https://api.taylorfrancis.com/content/books/mono/download?identifierName=doi&identifierValue=10.1201/9781003476047&type=googlepdf

22. Tiwari, A., & Darbari, M. (2025). Emerging trends in computer science and its application. CRC Press Boca Raton, FL, USA. https://api.taylorfrancis.com/content/books/mono /download?identifierName=doi&identifierValue=10.1201/9781003606635&type=googlepdf

23. Tran, K. P. (2022). Machine learning and probabilistic graphical models for decision support systems. CRC Press. https://api.taylorfrancis.com/content/books/mono/ download?identifierName=doi&identifierValue=10.1201/9781003189886&type=googlepdf

24. Wang, Z., Tang, R., Chen, G., Li, H., Deng, Y., Shen, J., & Li, D. (2026). A Review of Artificial Intelligence-Driven Smart Treatment of Aquaculture Effluent: Technical Framework, Application Scenarios, and Development Outlook. Water, 18(4), 470.

25. Zhang, W., Li, R., Zhao, J., Wang, J., Meng, X., & Li, Q. (2023). Miss-gradient boosting regression tree: A novel approach to imputing water treatment data. Applied Intelligence, 53(19), 22917–22937. https://doi.org/10.1007/s10489-023-04828-6

Downloads

Published

2026-05-27