RESEARCH ARTICLE


A Novel Application of K-means Cluster Prediction Model for Diabetes Early Identification using Dimensionality Reduction Techniques



Vamshi Krishna B.1, Raguru Jaya K.1, Bhuvaneswari A. P.2, Gururaj H. L.3, *, Vinayakumar Ravi4, *, Meshari Almeshari5, Yasser Alzamil5
1 Department of Computer Science and Engineering, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India
2 School of Computer Science and Applications, Reva University, Bengaluru, India
3 Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India
4 Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, Saudi Arabia
5 Department of Diagnostic Radiology, College of Applied Medical Sciences, University of Ha'il, Ha'il, Saudi Arabia


Article Metrics

CrossRef Citations:
0
Total Statistics:

Full-Text HTML Views: 963
Abstract HTML Views: 448
PDF Downloads: 367
ePub Downloads: 168
Total Views/Downloads: 1946
Unique Statistics:

Full-Text HTML Views: 592
Abstract HTML Views: 192
PDF Downloads: 267
ePub Downloads: 140
Total Views/Downloads: 1191



Creative Commons License
© 2023 Krishna B et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to these authors at the Center for Artificial Intelligence, Prince Mohammad Bin Fahd University, Khobar, Saudi Arabia, Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, India; E-mails: vinayakumarr77@gmail.com;gururaj.hl@manipal.edu


Abstract

Purpose:

Diabetes is a condition where the body cannot utilize insulin properly. Maintenance of the levels of insulin in the body is mandatory, otherwise it will lead to several disorders of kidney failure, heart attack, nervous weakness, blindness, etc. Among the 10 majority diseases, diabetes is occupying the second role by covering 34.2 million individuals as for the National Diabetes Statistics report. According to the World Health Organization, diabetes is playing the 7th role in cause of death. Thus early identification of diabetes can overcome these severe damages.

Methods:

Accurate predictions require a lot of data, which is introducing the curse of dimensionality. In the present research, PIMA Indians diabetes data set is considered and different classification models viz., K-means clustering with logistic regression, SVM (Support Vector Machine), Random Forest, etc. are implemented in predicting the accuracy of diabetes.

Results:

The accuracies for diabetes prediction are ranging from 0.9875 to 1.0. KCPM (K-means cluster prediction model) and has shown an increase in accuracy of 0.67% for the combined K -means clustering and different classification algorithms. In KCPM, firstly, the data is clustered using k-means into patients with and without diabetes, and then the clustered results are compared with the target variable and then filtered, followed by applying the different supervised classification algorithms for predicting the disease.

Conclusion:

The results show that KCPM predicts diabetes with a higher accuracy of 0.67% compared with other existing methods. By KCPM-based automated diabetes analysis system, early prediction of the disease may protect patients from facing severe disorders in life.

Keywords: Clustering, Classification, Curse of dimensionality, Diabetes, Prediction, Classifiers, Accuracy.