ENHANCING PREDICTIVE PERFORMANCE IN COVID-19 HEALTHCARE DATASETS: A CASE STUDY BASED ON HYPER ADASYN OVER-SAMPLING AND GENETIC FEATURE SELECTION

dc.authoridAta, Oguz/0000-0003-4511-7694
dc.authoridAhmed, Abdulrahman./0000-0002-1002-5792
dc.contributor.authorMohammedqasim, Hayder
dc.contributor.authorJasim, Abdulrahman Ahmed
dc.contributor.authorMohammedqasem, Roa'a
dc.contributor.authorAta, Oguz
dc.date.accessioned2025-02-06T17:58:27Z
dc.date.available2025-02-06T17:58:27Z
dc.date.issued2024
dc.departmentAltınbaş Üniversitesien_US
dc.description.abstractPredictive analytics is paramount in the health industry, where it finds its wide application, in that it helps increase the forecast's accuracy level based on big data. Most of the time, there is a tendency toward the imbalance of the datasets in healthcare. In this study, two COVID-19 datasets from Kaggle were used as a case study of dataset imbalance. In such scenarios of imbalanced datasets like COVID-19, conventional sampling methods like ADASYN (Adaptive Synthetic Sampling Approach for Imbalanced Learning) tend to yield only modest accuracy levels. To address another problem like finding the optimal features, this study proposes a novel approach that combines oversampling techniques with genetic feature selection (GFs) using laboratory data. This innovative method aims to construct machine -learning clinical prediction models for the identification of COVID-19 infected patients, leveraging two widely recognized datasets by using hyper ADASYN over -sampling and genetic feature selection, stands out for its unprecedented precision in identifying relevant features crucial for accurate predictions. Unlike the traditional approach, it can solve the class imbalance problem and tune the feature space to bring about a dramatic increase in accuracy, precision, recall, and overall predictive performance by using our hypermodel. Our approach significantly enhanced the performance of the classifier, and the Random Forest (RF) model with n trees classifies accurately to the limit of 99%, with precision 99%, recall 99%, and F1 -score 99% for each of the datasets. Decision Tree (DT) model achieved 92% with all metrics for Dataset I, and 95% with all metrics for Dataset II. Multilayer Perceptron (MLP) achieved 99% with all metrics, respectively, for both datasets. Gradient Boosting (XGB) achieved 97% for all metrics with dataset I and 98% with all metrics for dataset II. These results underscore the efficacy of our proposed method in balancing COVID-19 datasets and enhancing predictive accuracy.en_US
dc.identifier.endpage617en_US
dc.identifier.issn1823-4690
dc.identifier.issue2en_US
dc.identifier.scopus2-s2.0-85196270321
dc.identifier.scopusqualityQ3en_US
dc.identifier.startpage598en_US
dc.identifier.urihttps://hdl.handle.net/20.500.12939/5265
dc.identifier.volume19en_US
dc.identifier.wosWOS:001208135400008
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoenen_US
dc.publisherTaylors Univ Sdn Bhden_US
dc.relation.ispartofJournal of Engineering Science and Technologyen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.snmzKA_WOS_20250206
dc.subjectADASYNen_US
dc.subjectGenetic feature selectionen_US
dc.subjectHealthcareen_US
dc.subjectImbalanced datasetsen_US
dc.subjectOversamplingen_US
dc.subjectPredictive analyticsen_US
dc.titleENHANCING PREDICTIVE PERFORMANCE IN COVID-19 HEALTHCARE DATASETS: A CASE STUDY BASED ON HYPER ADASYN OVER-SAMPLING AND GENETIC FEATURE SELECTIONen_US
dc.typeArticleen_US

Dosyalar