An imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset

dc.contributor.advisorİnan, Timur
dc.contributor.authorAhmed, Anas Mudhafar Ahmed
dc.date.accessioned2023-09-19T07:12:57Z
dc.date.available2023-09-19T07:12:57Z
dc.date.issued2023en_US
dc.date.submitted2023
dc.departmentEnstitüler, Lisansüstü Eğitim Enstitüsü, Bilişim Teknolojileri Ana Bilim Dalıen_US
dc.description.abstractA new topic of research, one that is growing in importance on a daily basis due to the rising number of people being diagnosed with diabetes all over the world, is the development of an early prediction systems-based machine learning model for the disease. The complications of missing values in medical datasets in general, and the diabetic condition in particular, is one that machine learning models and case studies have to deal with. Imputation Method based Grey Wolf Algorithm is the name given to the newly proposed imputation algorithm that was developed as part of this research and is based on the GWO Algorithm (IGWO). The suggested IGWO technique requires a classifier to be used as a fitness function in order to be evaluated. This classifier should produce the highest possible classification accuracy from the dataset that is generated and should aim to do so. Therefore, the accuracy is achieved by utilizing three distinct classifiers, which are referred to as K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Naive Bayesian Classifier (NBC). Pima Indian Diabetes Disease (PIDD) is the primary dataset that was utilized in this investigation to estimate the values that were missing and to assess IGWO. The suggested approach was evaluated based on two distinct experiments. The first of which is k-fold cross validation (K=5) to validate the datasets that were generated. In contrast, in the second experiment, holdout validation is used to validate the results, and the created dataset is split into a training set (65%) and a testing set (35%). Based on the average of ten run times, the findings that were collected revealed that the IGWO-SVM was ranked the highest, while the IGWO-NBC ranked the lowest. In addition to this, IGWO with all classifiers had the best accuracies when compared to the four prominent approaches, which demonstrated that the optimization algorithm as an imputation procedure is superior to the statistical methods utilized in this research. In conclusion, the GWO algorithm has the potential to be utilized for estimating missing values in PIDD and medical datasets in general.en_US
dc.identifier.citationAhmed, Anas Mudhafar Ahmed. (2023). An imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset. (Yayınlanmamış yüksek lisans tezi). Altınbaş Üniversitesi, Lisansüstü Eğitim Enstitüsü, İstanbul.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12939/4021
dc.identifier.yoktezid805893
dc.institutionauthorAhmed, Anas Mudhafar Ahmed
dc.language.isoen
dc.publisherAltınbaş Üniversitesi / Lisansüstü Eğitim Enstitüsüen_US
dc.relation.publicationcategoryTezen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectMissing Valuesen_US
dc.subjectGrey Wolf Optimizeren_US
dc.subjectDiabetes Diseaseen_US
dc.subjectPIDDen_US
dc.subjectIGWOen_US
dc.titleAn imputation algorithm based on nature-inspired metaheuristic for missing values in the diabetes disease dataset
dc.typeMaster Thesis

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Tam Metin / Full Text
Boyut:
5.82 MB
Biçim:
Adobe Portable Document Format
Lisans paketi
Listeleniyor 1 - 1 / 1
[ X ]
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama:

Koleksiyon