İnan, TimurAhmed, Anas Mudhafar Ahmed2023-09-192023-09-1920232023Ahmed, Anas Mudhafar Ahmed. (2023). An imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset. (Yayınlanmamış yüksek lisans tezi). Altınbaş Üniversitesi, Lisansüstü Eğitim Enstitüsü, İstanbul.https://hdl.handle.net/20.500.12939/4021A new topic of research, one that is growing in importance on a daily basis due to the rising number of people being diagnosed with diabetes all over the world, is the development of an early prediction systems-based machine learning model for the disease. The complications of missing values in medical datasets in general, and the diabetic condition in particular, is one that machine learning models and case studies have to deal with. Imputation Method based Grey Wolf Algorithm is the name given to the newly proposed imputation algorithm that was developed as part of this research and is based on the GWO Algorithm (IGWO). The suggested IGWO technique requires a classifier to be used as a fitness function in order to be evaluated. This classifier should produce the highest possible classification accuracy from the dataset that is generated and should aim to do so. Therefore, the accuracy is achieved by utilizing three distinct classifiers, which are referred to as K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Naive Bayesian Classifier (NBC). Pima Indian Diabetes Disease (PIDD) is the primary dataset that was utilized in this investigation to estimate the values that were missing and to assess IGWO. The suggested approach was evaluated based on two distinct experiments. The first of which is k-fold cross validation (K=5) to validate the datasets that were generated. In contrast, in the second experiment, holdout validation is used to validate the results, and the created dataset is split into a training set (65%) and a testing set (35%). Based on the average of ten run times, the findings that were collected revealed that the IGWO-SVM was ranked the highest, while the IGWO-NBC ranked the lowest. In addition to this, IGWO with all classifiers had the best accuracies when compared to the four prominent approaches, which demonstrated that the optimization algorithm as an imputation procedure is superior to the statistical methods utilized in this research. In conclusion, the GWO algorithm has the potential to be utilized for estimating missing values in PIDD and medical datasets in general.eninfo:eu-repo/semantics/closedAccessMissing ValuesGrey Wolf OptimizerDiabetes DiseasePIDDIGWOAn imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease datasetMaster Thesis805893