An imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset
[ X ]
Tarih
2023
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Altınbaş Üniversitesi / Lisansüstü Eğitim Enstitüsü
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
A new topic of research, one that is growing in importance on a daily basis due to the rising
number of people being diagnosed with diabetes all over the world, is the development of
an early prediction systems-based machine learning model for the disease. The
complications of missing values in medical datasets in general, and the diabetic condition in
particular, is one that machine learning models and case studies have to deal with. Imputation
Method based Grey Wolf Algorithm is the name given to the newly proposed imputation
algorithm that was developed as part of this research and is based on the GWO Algorithm
(IGWO). The suggested IGWO technique requires a classifier to be used as a fitness function
in order to be evaluated. This classifier should produce the highest possible classification
accuracy from the dataset that is generated and should aim to do so. Therefore, the accuracy
is achieved by utilizing three distinct classifiers, which are referred to as K-Nearest
Neighbors (KNN), Support Vector Machine (SVM), and Naive Bayesian Classifier (NBC).
Pima Indian Diabetes Disease (PIDD) is the primary dataset that was utilized in this
investigation to estimate the values that were missing and to assess IGWO. The suggested
approach was evaluated based on two distinct experiments. The first of which is k-fold cross
validation (K=5) to validate the datasets that were generated. In contrast, in the second
experiment, holdout validation is used to validate the results, and the created dataset is split into a training set (65%) and a testing set (35%). Based on the average of ten run times, the
findings that were collected revealed that the IGWO-SVM was ranked the highest, while the
IGWO-NBC ranked the lowest. In addition to this, IGWO with all classifiers had the best
accuracies when compared to the four prominent approaches, which demonstrated that the
optimization algorithm as an imputation procedure is superior to the statistical methods
utilized in this research. In conclusion, the GWO algorithm has the potential to be utilized
for estimating missing values in PIDD and medical datasets in general.
Açıklama
Anahtar Kelimeler
Missing Values, Grey Wolf Optimizer, Diabetes Disease, PIDD, IGWO
Kaynak
WoS Q Değeri
Scopus Q Değeri
Cilt
Sayı
Künye
Ahmed, Anas Mudhafar Ahmed. (2023). An imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset. (Yayınlanmamış yüksek lisans tezi). Altınbaş Üniversitesi, Lisansüstü Eğitim Enstitüsü, İstanbul.