All published articles of this journal are available on ScienceDirect.

RESEARCH ARTICLE

Enhancing Early Diagnosis of Type II Diabetes through Feature Selection and Hybrid Metaheuristic Optimization Techniques

The Open Bioinformatics Journal 09 May 2025 RESEARCH ARTICLE DOI: 10.2174/0118750362382139250502100340

Abstract

Introduction

Type-II Diabetes Mellitus (T2DM) is a chronic metabolic disorder characterized by elevated blood glucose levels, posing a critical global health challenge. It is largely attributed to lifestyle changes, unhealthy dietary habits, and lack of awareness. If not diagnosed early, T2DM can lead to severe complications, including damage to vital organs such as the kidneys, heart, and nerves. While timely and accurate diagnosis is crucial, current diagnostic procedures are often costly and time-consuming, necessitating innovative approaches to improve early detection. This study aimed to enhance the early prediction of T2DM by leveraging advanced hybrid metaheuristic optimization algorithms to improve model efficiency, accuracy, and computational time. The objective of this study is to develop a robust and interpretable hybrid machine learning framework that combines feature selection and metaheuristic optimization techniques to enable early, accurate, and computationally efficient diagnosis of T2DM.

Method

The methodology employed in this study involved three key steps: feature selection and refinement, model optimization, and evaluation. For feature selection, SHAP (SHapley Additive exPlanations) was integrated with Support Vector Machines (SVMs) to identify the most significant predictive features. This was followed by Particle Swarm Optimization (PSO), which was utilized for feature refinement, ensuring a concise yet highly informative feature set. In the model optimization phase, Genetic Algorithms (GAs) were applied to optimize the hyperparameters of machine learning models, including Artificial Neural Networks (ANNs), Random Forest (RF), and SVM. Bayesian Optimization (BO) was then employed to further refine these hyperparameters, enhancing overall model performance. Finally, the models were evaluated using key classification metrics, such as accuracy, Receiver Operating Characteristic (ROC) curves, and F1 scores, to ensure the robustness and reliability of the proposed approach.

Result

Among all models, the hybrid Random Forest model incorporating SHAP, PSO, GA, and BO demonstrated superior performance with 99.0% accuracy, a 94.8% F1-score, and an AUC of 1.00. The model also maintained high performance on the PIDD dataset, confirming its robustness and generalizability.

Discussion

The hybrid metaheuristic framework significantly improved prediction accuracy and efficiency for early T2DM diagnosis compared to conventional models. These findings support the growing evidence for integrating feature selection and optimization in clinical prediction. However, the study is limited by the use of publicly available datasets and lacks clinical validation, which should be addressed in future work.

Conclusion

The proposed hybrid metaheuristic framework offers a reliable and scalable solution for early diabetes prediction. It advances the application of AI in healthcare by improving diagnostic accuracy and supporting timely medical interventions. Future work should include clinical deployment, real-time validation, and dataset expansion for greater generalizability.

Keywords: Hybrid model, Metaheuristic optimization, Machine learning, SHapley Additive exPlanations (SHAP), Particle Swarm Optimization (PSO), Genetic algorithm.
Fulltext HTML PDF ePub
1800
1801
1802
1803
1804