EXPLAINABLE MACHINE LEARNING FOR EVALUATING DIABETES PREDICTION MODELS
DOI:
https://doi.org/10.53555/azgy5g23Keywords:
Diabetes prediction, Explainable AI (XAI), SHAP, XGBoost importances, machine learningAbstract
The present study has investigated the use of Explainable AI (XAI) machine learning models for diabetes prediction, with a particular focus on SHAP (Shapley Additive Explanations) values and XGBoost importances (gain). Using the publicly available Pima Indians Diabetes dataset there were various classification models: Logistic Regression, Random Forest, and XGBoost trained and evaluated based on the parameters: Accuracy, and AUC. Global and local explainability was further assessed through SHAP bar plots and XGboost gain importances plots were used for analyzing structural feature importances. Results gathered suggested that stabilized glucose level and age were the most influential predictors of diabetes across models even though differences emerged in the ranking of the feature HDL (High-Density Lipoprotein) cholesterol. Overall, while SHAP provided a hybrid understanding of local and global patient-specific explanations, XGboost gain importances gave structural importance. The discussion section also includes how the combination of these two tools provided a hybrid, comprehensive and nuanced understanding about how these models bridge the “black-box” gap in clinical decision-making, ensuring trust, clarity and transparency in the prediction of AI models for sensitive domains like healthcare applications.
References
[1] “Diabetes Dataset,” Kaggle, IMT Kaggle Team. [Online]. Available: https://www.kaggle.com/datasets/imtkaggleteam/diabetes
[2] E. K. Marsh, “Calculating XGBoost Feature Importance,” Medium, Jan. 31, 2023. [Online]. Available: https://medium.com/@emilykmarsh/xgboost-feature-importance-233ee27c33a4
[3] S. Kaliappan, A. G. M. S. Raza, P. Anbalagan, and S. Al-Makhadmeh, “Analyzing classification and feature selection strategies for diabetes prediction,” Frontiers in Artificial Intelligence, vol. 7, p. 1421751, 2024. [Online]. Available: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1421751/full
[4] M. Kibria, T. A. B. Taz, M. S. Miah, and M. A. Rahman, “An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft-Voting Classifier with an Explainable AI,” Sensors, vol. 22, no. 19, p. 7268, 2022. [Online]. Available: https://www.mdpi.com/1424-8220/22/19/7268
[5] J. Liu, Y. Liu, Y. Xiong, X. Chen, and Y. Liu, “Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Technique,” Frontiers in Public Health, vol. 10, p. 10107388, 2022. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC10107388/
[6] J. Brownlee, “Feature Importance and Feature Selection With XGBoost in Python,” Machine Learning Mastery, 2019. [Online]. Available: https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/
[7] A. Lundberg and S.-I. Lee, “XGBoost gain-based feature importance vs. SHAP normalized importance (averaged for all patients),” ResearchGate, 2021. [Online]. Available: https://www.researchgate.net/figure/XGBoost-gain-based-feature-importance-vs-SHAP-normalized-importance-averaged-for-all_fig3_393302056
[8] C. Molnar, “Interpretable machine learning: A guide for making black box models explainable,” Computer Methods and Programs in Biomedicine, vol. 206, p. 106581, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/abs/pii/S0169260721006581
[9] Ghatnekar, Atharva, and Aakash Dhananjay Shanbhag. "Explainable, multi-region price prediction." In 2021 International conference on electrical, computer and energy technologies (ICECET), pp. 1-7. IEEE, 2021.
[10] Vashi, C., & Shanbhag, A. (2023, July). Comparative explainable Machine learning to evaluate Bank marketing success. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) (pp. 1-5). IEEE.