Textual Authenticity in the AI Era: Evaluating BERT and RoBERTa with Logistic Regression and Neural Networks for Text Classification

dc.contributor.authorHazim, Layth Rafea
dc.contributor.authorAta, Oğuz
dc.date.accessioned2025-02-06T18:01:20Z
dc.date.available2025-02-06T18:01:20Z
dc.date.issued2024
dc.departmentAltınbaş Üniversitesien_US
dc.departmentEnstitüler, Lisansüstü Eğitim Enstitüsü, Elektrik ve Bilgisayar Mühendisliği Bölümü
dc.descriptionContinental; et al.; Forvia; Hamilton; Magna; Schaeffleren_US
dc.description16th International Symposium on Electronics and Telecommunications, ISETC 2024 -- 7 November 2024 through 8 November 2024 -- Timisoara -- 205402en_US
dc.description.abstractAI-generated content impersonating human writing is an issue that has gained attention as AI spreads its wings. This particular study serves as a comparison between the existing Logistic Regression and Feedforward Neural Networks (FNNs) by employing sentence-BERT-appended models and RoBERTa in the determination of authenticity in a given text. A dataset that was balanced between human-written and AI-generated texts was utilized. Techniques like tokenization and normalization were first used, followed by feature extraction using transformer-based models. Cross-validation and confusion matrix analysis that used measures such as accuracy, precision, recall, F1 score, and ROC AUC were included to guarantee the models' robustness. The hybrid RoBERTa-FNN model that was deposed challengers was the most outstanding model in respect of precision and recall, and the highest accuracy (99.95%) was obtained as mentioned in the data. The improved performance serves as a proof of how effectively RoBERTa uses its embeddings to represent context on the fine-grained level required for this kind of text classification. This work is a stepping stone to the creation of strong AI text detection systems, besides our advancements to the knowledge of models and embedding performance with respect to text classification. The results lay emphasis on the selection of model configuration and the embedding technique, as they are the key factors in achieving the best results in practical applications. © 2024 IEEE.en_US
dc.identifier.doi10.1109/ISETC63109.2024.10797291
dc.identifier.isbn979-835039086-5
dc.identifier.scopus2-s2.0-85215929245
dc.identifier.urihttps://doi.org/10.1109/ISETC63109.2024.10797291
dc.identifier.urihttps://hdl.handle.net/20.500.12939/5321
dc.indekslendigikaynakScopus
dc.institutionauthorAta, Oğuz
dc.institutionauthorHazim, Layth Rafea
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.ispartof2024 16th International Symposium on Electronics and Telecommunications, ISETC 2024 - Conference Proceedingsen_US
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.snmzKA_Scopus_20250206
dc.subjectAI-Generated Content Detectionen_US
dc.subjectFeedforward Neural Networks (FNN)en_US
dc.subjectLogistic Regressionen_US
dc.subjectNatural Language Processing (NLP)en_US
dc.subjectRoBERTaen_US
dc.subjectSentence-BERT (SBERT)en_US
dc.titleTextual Authenticity in the AI Era: Evaluating BERT and RoBERTa with Logistic Regression and Neural Networks for Text Classificationen_US
dc.typeConference Objecten_US

Dosyalar