Analisis Perbandingan Model Bert Dan Xlnet Untuk Klasifikasi Tweet Bully Pada Twitter

Penulis

  • Teuku Radillah Institut Teknologi Mitra Gama, Kabupaten Bengkalis
  • Okta Veza Universitas Ibnu Sina, Batam
  • Sarjon Defit Universitas Putra Indonesia YPTK Padang, Padang

DOI:

https://doi.org/10.25126/jtiik.1169096

Abstrak

Fenomena bullying di media sosial, khususnya di Twitter, telah menjadi isu yang semakin memprihatinkan dengan dampak signifikan terhadap kesehatan mental pengguna. Dalam rangka mengatasi masalah ini, deteksi otomatis tweet yang mengandung konten bullying menjadi sangat penting. Penelitian ini bertujuan untuk membandingkan performa dua model pemrosesan bahasa alami terbaru, yaitu BERT (Bidirectional Encoder Representations from Transformers) dan XLNet, dalam klasifikasi tweet yang mengandung bullying. Metodologi penelitian ini melibatkan pengumpulan dataset tweet yang telah dilabeli sebagai bullying atau non-bullying. Proses preprocessing teks dilakukan untuk membersihkan dan menyiapkan data sebelum digunakan dalam pelatihan model. Kedua model, BERT dan XLNet, dilatih dan diuji menggunakan dataset yang sama. Evaluasi performa dilakukan dengan menggunakan metrik akurasi, presisi, recall, dan F1-score. Hasil penelitian menunjukkan bahwa kedua model memiliki kemampuan yang baik dalam mengidentifikasi tweet bullying, akan tetapi XLNet menunjukkan performa yang lebih unggul dibandingkan BERT dengan tingkat akurasi sebesar 95%. Dengan nilai presisi  = 100%, recall  = 0,87%, dan F1-score = 0,88%. XLNet mampu menangkap konteks dan nuansa bahasa yang lebih kompleks dalam tweet, yang berkontribusi pada akurasi klasifikasi yang lebih tinggi. Penelitian ini memberikan kontribusi penting dalam bidang deteksi bullying di media sosial dengan menunjukkan bahwa penggunaan model XLNet lebih efektif dibandingkan BERT. Temuan ini dapat membantu platform seperti Twitter dalam mengidentifikasi dan mencegah konten bullying, sehingga menciptakan lingkungan online yang lebih aman bagi pengguna, serta dapat digunakan sebagai dasar untuk pengembangan sistem deteksi bullying yang lebih canggih dan efisien di masa depan.

 

Abstract

The phenomenon of bullying on social media, particularly on Twitter, has become an increasingly concerning issue with significant impacts on users' mental health. In order to address this issue, automatic detection of tweets containing bullying content is crucial. This study aims to compare the performance of two recent natural language processing models, namely BERT (Bidirectional Encoder Representations from Transformers) and XLNet, in the classification of tweets containing bullying. The research methodology involves collecting a dataset of tweets that have been labelled as bullying or non-bullying. Text preprocessing is done to clean and prepare the data before it is used in model training. Both models, BERT and XLNet, were trained and tested using the same dataset. Performance evaluation was conducted using accuracy, precision, recall, and F1-score metrics. The results show that both models have a good ability to identify bullying tweets, but XLNet shows superior performance compared to BERT with an accuracy rate of 95%. With precision = 100%, recall = 0.87%, and F1-score = 0.88%. XLNet is able to capture more complex context and language nuances in tweets, which contributes to higher classification accuracy. This research makes an important contribution to the field of bullying detection on social media by showing that the use of the XLNet model is more effective than BERT. These findings can help platforms like Twitter identify and prevent bullying content, thereby creating a safer online environment for users, and can be used as a basis for the development of more sophisticated and efficient bullying detection systems in the future.

Downloads

Download data is not yet available.

Referensi

ADOMA, A. F., HENRY, N. M. AND CHEN, W. 2020 .Comparative Analyses of Bert, Roberta, Distilbert, and Xlnet for Text-Based Emotion Recognition. 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2020, pp. 117–121. doi: 10.1109/ICCWAMTIP51612.2020.9317379.

ANGGRAININGSIH, R., HASSAN, G. M. AND DATTA, A. 2023. CE-BERT: Concise and Efficient BERT-Based Model for Detecting Rumors on Twitter. IEEE Access, 11(July), pp. 80207–80217. doi: 10.1109/ACCESS.2023.3299858.

ARABADZHIEVA-KALCHEVA, N. AND KOVACHEV, I. 2021. Comparison of BERT and XLNet accuracy with classical methods and algorithms in text classification. Proceedings of the International Conference on Biomedical Innovations and Applications, BIA 2021. IEEE, 1(8), pp. 74–76. doi: 10.1109/BIA52594.2022.9831281.

ASGARI-CHENAGHLU, M. et al. 2021. Topic Detection and Tracking Techniques on Twitter: A Systematic Review’, Complexity, 2021. doi: 10.1155/2021/8833084.

DHIVYAA, C. R. et al. 2023. XLNet Transfer Learning Model for Sentimental Analysis. International Conference on Sustainable Computing and Smart Systems, ICSCSS 2023 - Proceedings, (Icscss), pp. 76–84. doi: 10.1109/ICSCSS57650.2023.10169445.

AL FARISI, F. A., PERDANA, R. S. AND ADIKARA, P. P. 2023. Klasifikasi Intensi dengan Metode Ling Short-Term Memory pada Chatbot Bahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(7), pp. 1511–1518. doi: 10.25126/jtiik.1078000.

FENG, C. et al. 2022. BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN. BioMed Research International, 2022. doi: 10.1155/2022/9015123.

GUPTA, K., JINAD, R. AND LIU, Q. 2023. Comparative Analysis of NLP Models for Detecting Depression on Twitter. Proceedings - 2023 International Conference on Communications, Computing and Artificial Intelligence, CCCAI 2023. IEEE, pp. 23–28. doi: 10.1109/CCCAI59026.2023.00013.

HABBAT, N., ANOUN, H. AND HASSOUNI, L. 2023 .Combination of GRU and CNN Deep Learning Models for Sentiment Analysis on French Customer Reviews Using XLNet Model. IEEE Engineering Management Review, 51(1), pp. 41–51. doi: 10.1109/EMR.2022.3208818.

HU, Y. et al. 2022. Short-Text Classification Detector: A Bert-Based Mental Approach. Computational Intelligence and Neuroscience, 2022. doi: 10.1155/2022/8660828.

LI, B., WANG, J. AND LIU, X. 2021. Parallel Cleaning Algorithm for Similar Duplicate Chinese Data Based on BERT. Scientific Programming, 2021(i). doi: 10.1155/2021/5916748.

LI, H. et al. 2020. Comparing BERT and XLNet from the Perspective of Computational Characteristics. 2020 International Conference on Electronics, Information, and Communication, ICEIC 2020. IEEE, pp. 1–4. doi: 10.1109/ICEIC49074.2020.9051081.

NABILA, P. AND SETIAWAN, E. B. 2024. Adam and AdamW Optimization Algorithm Application on BERT Model for Hate Speech Detection on Twitter. 2024 International Conference on Data Science and Its Applications (ICoDSA). IEEE, pp. 346–351. doi: 10.1109/icodsa62899.2024.10651619.

RICKO AND SASONGKO, P. S. 2021. Classification Bullying Tweet Using Convolutional Neural Network with Word2vec. Proceedings - International Conference on Informatics and Computational Sciences. IEEE, 2021-Novem, pp. 58–63. doi: 10.1109/ICICoS53627.2021.9651842.

SALMA, T. D., SAPTAWATI, G. A. P. AND RUSMAWATI, Y. 2021. Text Classification Using XLNet with Infomap Automatic Labeling Process. Proceedings - 2021 8th International Conference on Advanced Informatics: Concepts, Theory, and Application, ICAICTA 2021. IEEE, pp. 1–6. doi: 10.1109/ICAICTA53211.2021.9640255.

SANTHIYA, S. et al. 2024. A Comparative Exploration in Text Classification for Hate Speech and Offensive Language Detection Using BERT-Based and GloVe Embeddings. 2024 2nd International Conference on Disruptive Technologies, ICDT 2024. IEEE, pp. 1506–1509. doi: 10.1109/ICDT61202.2024.10489019.

SLAMET, C. et al. 2020. Deep learning approach for bullying classification on twitter social media with Indonesian language. Proceedings - 2020 6th International Conference on Wireless and Telematics, ICWT 2020. doi: 10.1109/ICWT50448.2020.9243653.

YAN, M. et al. 2022. A Multimodal Retrieval and Ranking Method for Scientific Documents Based on HFS and XLNet. Scientific Programming, 2022. doi: 10.1155/2022/5373531.

YAQIN, A. et al. 2023. Classification of Indonesian Tweet Bullying on Twitter Using K-Nearest Neighbor. 2023 International Conference on Informatics, Multimedia, Cyber and Information Systems, ICIMCIS 2023. IEEE, pp. 330–334. doi: 10.1109/ICIMCIS60089.2023.10348992.

Diterbitkan

10-12-2024

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Analisis Perbandingan Model Bert Dan Xlnet Untuk Klasifikasi Tweet Bully Pada Twitter. (2024). Jurnal Teknologi Informasi Dan Ilmu Komputer, 11(6), 1371-1376. https://doi.org/10.25126/jtiik.1169096