EXPLAINABLE MACHINE LEARNING FOR EVALUATING DIABETES PREDICTION MODELS

Saumyaa Dhakan

doi:10.53555/azgy5g23

Authors

Saumyaa Dhakan 12th Grade, IB Diploma Programme, Fountainhead School, Surat, India

DOI:

https://doi.org/10.53555/azgy5g23

Keywords:

Diabetes prediction, Explainable AI (XAI), SHAP, XGBoost importances, machine learning

Abstract

The present study has investigated the use of Explainable AI (XAI) machine learning models for diabetes prediction, with a particular focus on SHAP (Shapley Additive Explanations) values and XGBoost importances (gain). Using the publicly available Pima Indians Diabetes dataset there were various classification models: Logistic Regression, Random Forest, and XGBoost trained and evaluated based on the parameters: Accuracy, and AUC. Global and local explainability was further assessed through SHAP bar plots and XGboost gain importances plots were used for analyzing structural feature importances. Results gathered suggested that stabilized glucose level and age were the most influential predictors of diabetes across models even though differences emerged in the ranking of the feature HDL (High-Density Lipoprotein) cholesterol. Overall, while SHAP provided a hybrid understanding of local and global patient-specific explanations, XGboost gain importances gave structural importance. The discussion section also includes how the combination of these two tools provided a hybrid, comprehensive and nuanced understanding about how these models bridge the “black-box” gap in clinical decision-making, ensuring trust, clarity and transparency in the prediction of AI models for sensitive domains like healthcare applications.

References

[1] “Diabetes Dataset,” Kaggle, IMT Kaggle Team. [Online]. Available: https://www.kaggle.com/datasets/imtkaggleteam/diabetes

[2] E. K. Marsh, “Calculating XGBoost Feature Importance,” Medium, Jan. 31, 2023. [Online]. Available: https://medium.com/@emilykmarsh/xgboost-feature-importance-233ee27c33a4

[3] S. Kaliappan, A. G. M. S. Raza, P. Anbalagan, and S. Al-Makhadmeh, “Analyzing classification and feature selection strategies for diabetes prediction,” Frontiers in Artificial Intelligence, vol. 7, p. 1421751, 2024. [Online]. Available: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1421751/full

[4] M. Kibria, T. A. B. Taz, M. S. Miah, and M. A. Rahman, “An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft-Voting Classifier with an Explainable AI,” Sensors, vol. 22, no. 19, p. 7268, 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/19/7268

[5] J. Liu, Y. Liu, Y. Xiong, X. Chen, and Y. Liu, “Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Technique,” Frontiers in Public Health, vol. 10, p. 10107388, 2022. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC10107388/

[6] J. Brownlee, “Feature Importance and Feature Selection With XGBoost in Python,” Machine Learning Mastery, 2019. [Online]. Available: https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/

[7] A. Lundberg and S.-I. Lee, “XGBoost gain-based feature importance vs. SHAP normalized importance (averaged for all patients),” ResearchGate, 2021. [Online]. Available: https://www.researchgate.net/figure/XGBoost-gain-based-feature-importance-vs-SHAP-normalized-importance-averaged-for-all_fig3_393302056

[8] C. Molnar, “Interpretable machine learning: A guide for making black box models explainable,” Computer Methods and Programs in Biomedicine, vol. 206, p. 106581, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0169260721006581

[9] Ghatnekar, Atharva, and Aakash Dhananjay Shanbhag. "Explainable, multi-region price prediction." In 2021 International conference on electrical, computer and energy technologies (ICECET), pp. 1-7. IEEE, 2021.

[10] Vashi, C., & Shanbhag, A. (2023, July). Comparative explainable Machine learning to evaluate Bank marketing success. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) (pp. 1-5). IEEE.

EXPLAINABLE MACHINE LEARNING FOR EVALUATING DIABETES PREDICTION MODELS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

crossref

Make a Submission

Latest publications

Information