Textual Authenticity in the AI Era: Evaluating BERT and RoBERTa with Logistic Regression and Neural Networks for Text Classification

[ X ]

Tarih

2024

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Institute of Electrical and Electronics Engineers Inc.

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

AI-generated content impersonating human writing is an issue that has gained attention as AI spreads its wings. This particular study serves as a comparison between the existing Logistic Regression and Feedforward Neural Networks (FNNs) by employing sentence-BERT-appended models and RoBERTa in the determination of authenticity in a given text. A dataset that was balanced between human-written and AI-generated texts was utilized. Techniques like tokenization and normalization were first used, followed by feature extraction using transformer-based models. Cross-validation and confusion matrix analysis that used measures such as accuracy, precision, recall, F1 score, and ROC AUC were included to guarantee the models' robustness. The hybrid RoBERTa-FNN model that was deposed challengers was the most outstanding model in respect of precision and recall, and the highest accuracy (99.95%) was obtained as mentioned in the data. The improved performance serves as a proof of how effectively RoBERTa uses its embeddings to represent context on the fine-grained level required for this kind of text classification. This work is a stepping stone to the creation of strong AI text detection systems, besides our advancements to the knowledge of models and embedding performance with respect to text classification. The results lay emphasis on the selection of model configuration and the embedding technique, as they are the key factors in achieving the best results in practical applications. © 2024 IEEE.

Açıklama

Continental; et al.; Forvia; Hamilton; Magna; Schaeffler
16th International Symposium on Electronics and Telecommunications, ISETC 2024 -- 7 November 2024 through 8 November 2024 -- Timisoara -- 205402

Anahtar Kelimeler

AI-Generated Content Detection, Feedforward Neural Networks (FNN), Logistic Regression, Natural Language Processing (NLP), RoBERTa, Sentence-BERT (SBERT)

Kaynak

2024 16th International Symposium on Electronics and Telecommunications, ISETC 2024 - Conference Proceedings

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye