A Novel Application of K-means Cluster Prediction Model for Diabetes Early Identification using Dimensionality Reduction Techniques
Vamshi Krishna B.1, Raguru Jaya K.1, Bhuvaneswari A. P.2, Gururaj H. L.3, *, Vinayakumar Ravi4, *, Meshari Almeshari5, Yasser Alzamil5
Identifiers and Pagination:Year: 2023
E-location ID: e187503622307310
Publisher ID: e187503622307310
Article History:Received Date: 03/05/2023
Revision Received Date: 16/06/2023
Acceptance Date: 26/06/2023
Electronic publication date: 31/08/2023
Collection year: 2023
open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Diabetes is a condition where the body cannot utilize insulin properly. Maintenance of the levels of insulin in the body is mandatory, otherwise it will lead to several disorders of kidney failure, heart attack, nervous weakness, blindness, etc. Among the 10 majority diseases, diabetes is occupying the second role by covering 34.2 million individuals as for the National Diabetes Statistics report. According to the World Health Organization, diabetes is playing the 7th role in cause of death. Thus early identification of diabetes can overcome these severe damages.
Accurate predictions require a lot of data, which is introducing the curse of dimensionality. In the present research, PIMA Indians diabetes data set is considered and different classification models viz., K-means clustering with logistic regression, SVM (Support Vector Machine), Random Forest, etc. are implemented in predicting the accuracy of diabetes.
The accuracies for diabetes prediction are ranging from 0.9875 to 1.0. KCPM (K-means cluster prediction model) and has shown an increase in accuracy of 0.67% for the combined K -means clustering and different classification algorithms. In KCPM, firstly, the data is clustered using k-means into patients with and without diabetes, and then the clustered results are compared with the target variable and then filtered, followed by applying the different supervised classification algorithms for predicting the disease.
The results show that KCPM predicts diabetes with a higher accuracy of 0.67% compared with other existing methods. By KCPM-based automated diabetes analysis system, early prediction of the disease may protect patients from facing severe disorders in life.