Comparative analysis of different distributions dataset by using data mining techniques on credit card fraud detection

Ata, Oğuz; Hazim, Layth

Comparative analysis of different distributions dataset by using data mining techniques on credit card fraud detection

dc.contributor.author	Ata, Oğuz
dc.contributor.author	Hazim, Layth
dc.date.accessioned	2021-05-15T11:33:45Z
dc.date.available	2021-05-15T11:33:45Z
dc.date.issued	2020
dc.department	Mühendislik ve Doğa Bilimleri Fakültesi, Yazılım Mühendisliği Bölümü	en_US
dc.description	Ata, Oguz/0000-0003-4511-7694; Hazim, Layth/0000-0001-8066-2175
dc.description.abstract	Banks suffer multimillion-dollars losses each year for several reasons, the most important of which is due to credit card fraud. The issue is how to cope with the challenges we face with this kind of fraud. Skewed "class imbalance" is a very important challenge that faces this kind of fraud. Therefore, in this study, we explore four data mining techniques, namely naive Bayesian (NB),Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF), on actual credit card transactions from European cardholders. This paper offers four major contributions. First, we used under-sampling to balance the dataset because of the high imbalance class, implying skewed distribution. Second, we applied NB, SVM, KNN, and RF to under-sampled class to classify the transactions into fraudulent and genuine followed by testing the performance measures using a confusion matrix and comparing them. Third, we adopted cross-validation (CV) with 10 folds to test the accuracy of the four models with a standard deviation followed by comparing the results for all our models. Next, we examined these models against the entire dataset (skewed) using the confusion matrix and AUC (Area Under the ROC Curve) ranking measure to conclude the final results to determine which would be the best model for us to use with a particular type of fraud. The results showing the best accuracy for the NB, SVM, KNN and RF classifiers are 97,80%; 97,46%; 98,16% and 98,23%, respectively. The comparative results have been done by using four-division datasets (75:25), (90:10), (66:34) and (80:20) displayed that the RF performs better than NB, SVM, and KNN, and the results when utilizing our proposed models on the entire dataset (skewed), achieved preferable outcomes to the under-sampled dataset.	en_US
dc.description.sponsorship	ULB Machine Learning Group	en_US
dc.description.sponsorship	We thank and wish for an increase in knowledge to Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and GianlucaBontempi in [29] for supporting by the real dataset transactions from ULB Machine Learning Group and their description of dataset.	en_US
dc.identifier.doi	10.17559/TV-20180427091048
dc.identifier.endpage	626	en_US
dc.identifier.issn	1330-3651
dc.identifier.issn	1848-6339
dc.identifier.issue	2	en_US
dc.identifier.scopus	2-s2.0-85083436143
dc.identifier.scopusquality	Q3
dc.identifier.startpage	618	en_US
dc.identifier.uri	https://doi.org/10.17559/TV-20180427091048
dc.identifier.uri	https://hdl.handle.net/20.500.12939/221
dc.identifier.volume	27	en_US
dc.identifier.wos	WOS:000527241900037
dc.identifier.wosquality	Q3
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.institutionauthor	Ata, Oğuz
dc.language.iso	en
dc.publisher	Univ Osijek, Tech Fac	en_US
dc.relation.ispartof	Tehnicki Vjesnik-Technical Gazette
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Credit Card Fraud Detection	en_US
dc.subject	Data Mining	en_US
dc.subject	K-Nearest Neighbour	en_US
dc.subject	Naive Bayesian	en_US
dc.subject	Random Forest	en_US
dc.subject	Support Vector Machine	en_US
dc.title	Comparative analysis of different distributions dataset by using data mining techniques on credit card fraud detection
dc.type	Article

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques.pdf
Boyut:: 597.54 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin/ Full Text

İndir

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
Yazılım Mühendisliği Bölümü Koleksiyonu