Textual Authenticity in the AI Era: Evaluating BERT and RoBERTa with Logistic Regression and Neural Networks for Text Classification
[ X ]
Tarih
2024
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Institute of Electrical and Electronics Engineers Inc.
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
AI-generated content impersonating human writing is an issue that has gained attention as AI spreads its wings. This particular study serves as a comparison between the existing Logistic Regression and Feedforward Neural Networks (FNNs) by employing sentence-BERT-appended models and RoBERTa in the determination of authenticity in a given text. A dataset that was balanced between human-written and AI-generated texts was utilized. Techniques like tokenization and normalization were first used, followed by feature extraction using transformer-based models. Cross-validation and confusion matrix analysis that used measures such as accuracy, precision, recall, F1 score, and ROC AUC were included to guarantee the models' robustness. The hybrid RoBERTa-FNN model that was deposed challengers was the most outstanding model in respect of precision and recall, and the highest accuracy (99.95%) was obtained as mentioned in the data. The improved performance serves as a proof of how effectively RoBERTa uses its embeddings to represent context on the fine-grained level required for this kind of text classification. This work is a stepping stone to the creation of strong AI text detection systems, besides our advancements to the knowledge of models and embedding performance with respect to text classification. The results lay emphasis on the selection of model configuration and the embedding technique, as they are the key factors in achieving the best results in practical applications. © 2024 IEEE.
Açıklama
Continental; et al.; Forvia; Hamilton; Magna; Schaeffler
16th International Symposium on Electronics and Telecommunications, ISETC 2024 -- 7 November 2024 through 8 November 2024 -- Timisoara -- 205402
16th International Symposium on Electronics and Telecommunications, ISETC 2024 -- 7 November 2024 through 8 November 2024 -- Timisoara -- 205402
Anahtar Kelimeler
AI-Generated Content Detection, Feedforward Neural Networks (FNN), Logistic Regression, Natural Language Processing (NLP), RoBERTa, Sentence-BERT (SBERT)
Kaynak
2024 16th International Symposium on Electronics and Telecommunications, ISETC 2024 - Conference Proceedings