Analisis BERT-CNN untuk Klasifikasi Multi-Label Diskusi Keagamaan dan Asosiasi dengan Al-Qur’an dan Hadits
DOI:
https://doi.org/10.25126/jtiik.2025126Kata Kunci:
Klasifikasi Multi-Label, BERT, CNN, Diskusi Keagamaan, NLP, Al-Qur’an, HaditsAbstrak
Penelitian ini menerapkan Natural Language Processing (NLP) dengan Bidirectional Encoder Representations from Transformers (BERT) yang dikombinasikan dengan Convolutional Neural Network (CNN) untuk klasifikasi multi-label diskusi keagamaan serta mengasosiasikannya dengan ayat Al-Qur’an dan Hadits. Dataset yang digunakan berasal dari diskusi keagamaan dan pertanyaan jamaah kepada ustaz yang diunggah di berbagai platform digital, seperti YouTube, Facebook, Instagram, dan situs web. Model NLP berbasis BERT digunakan untuk merepresentasikan teks secara kontekstual, sementara CNN digunakan untuk mengekstraksi fitur dan melakukan klasifikasi multi-label. Eksperimen dilakukan untuk mengeksplorasi kombinasi parameter dan pendekatan preprocessing teks guna meningkatkan akurasi klasifikasi. Hasil menunjukkan bahwa tuning hyperparameter meningkatkan F1-Score pada konfigurasi parameter ke-2 (E2) dari 0.7046 menjadi 0.7789 dan pada konfigurasi parameter ke-5 (E5) dari 0.7073 menjadi 0.7734, serta menurunkan Hamming Loss, yang mengindikasikan peningkatan akurasi prediksi label. Threshold 0.40 ditemukan sebagai nilai optimal untuk keseimbangan Precision dan Recall, yang berkontribusi terhadap peningkatan Subset Accuracy. Penelitian ini diharapkan dapat berkontribusi dalam pengembangan teknologi NLP berbasis bahasa Indonesia untuk klasifikasi multi-label teks keagamaan serta membuka peluang penerapan dalam aplikasi kecerdasan buatan guna meningkatkan akses informasi keagamaan secara cepat dan akurat.
Abstract
This study applies Natural Language Processing (NLP) using Bidirectional Encoder Representations from Transformers (BERT) combined with Convolutional Neural Networks (CNN) for multi-label classification of religious discussions and their association with verses of the Qur’an and Hadith. The dataset was obtained from religious discussions and congregants’ questions addressed to ustaz, collected from various digital platforms such as YouTube, Facebook, Instagram, and websites. The BERT-based NLP model was employed to represent text contextually, while CNN was used to extract features and perform multi-label classification. Experiments were conducted to explore parameter combinations and text preprocessing approaches to improve classification accuracy. The results show that hyperparameter tuning increased the F1-Score in the second parameter configuration (E2) from 0.7046 to 0.7789 and in the fifth configuration (E5) from 0.7073 to 0.7734, while reducing the Hamming Loss, indicating an improvement in label prediction accuracy. A threshold of 0.40 was found to be the optimal value for balancing Precision and Recall, contributing to an increase in Subset Accuracy. This research is expected to contribute to the development of Indonesian-language NLP technology for multi-label classification of religious texts and to open opportunities for practical applications in artificial intelligence systems to enhance rapid and accurate access to religious information.
Downloads
Referensi
AHMADI, H. A., & CHOWANDA, A. 2023. Clickbait Classification Model on Online News With Semantic Similarity Calculation Between News Title and Content. Building of Informatics Technology and Science (Bits), 4(4). https://doi.org/10.47065/bits.v4i4.3030
AHMED, S. BIN. 2022. Sentence Continuation Inference of Urdu Text by BERT Technique. SSRN Electronic Journal, 1(1). https://doi.org/10.2139/ssrn.4144163
AKBAR, I., FAISAL, M., & CHAMIDY, T. 2024. Multi-label classification of Indonesian qur’an translation using long short-term memory model. Computer Network, Computing,
Electronics, and Control Journal, 4(3), 119–128. https://kinetik.umm.ac.id/index.php/kinetik/article/view/1901
ALDREABI, E., & BLACKBURN, J. 2023. Enhancing Automated Hate Speech Detection: Addressing Islamophobia and Freedom of Speech in Online Discussions. Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2023, 644–651. https://doi.org/10.1145/3625007.3627487
ALTAMMAMI, S., ATWELL, E., & ALSALKA, A. 2020. Constructing a Bilingual Hadith Corpus Using a Segmentation Tool. Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 3390–3398. https://aclanthology.org/2020.lrec-1.415
ALYOUBI, K. H., ALOTAIBI, F. S., KUMAR, A., GUPTA, V., & SHARMA, A. 2023. A Novel Multi-Layer Feature Fusion-Based BERT-CNN for Sentence Representation Learning and Classification. Robotic Intelligence and Automation, 43(6), 704–715. https://doi.org/10.1108/ria-04-2023-0047
ARKOK, B., & ZEKI, A. M. 2021. Classification of Qur’anic topics based on imbalanced classification. Indonesian Journal of Electrical Engineering and Computer Science, 22(2), 678–687. https://doi.org/10.11591/ijeecs.v22.i2.pp678-687
ARSLAN, M., & CRUZ, C. 2024. Business Text Classification With Imbalanced Data and Moderately Large Label Spaces for Digital Transformation. Applied Network Science, 9(1). https://doi.org/10.1007/s41109-024-00623-5
BUDIMAN, I., FAISAL, M. R., FARIDHAH, A., FARMADI, A., MAZDADI, M. I., SARAGIH, T. H., & ABADI, F. 2024. Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media. Journal of Computer Sciences Institute, 30, 61–67. https://doi.org/10.35784/jcsi.5564
CAI, L., SHEN, Y., LIU, T., & ZHANG, K. 2020. A Hybrid BERT Model That Incorporates Label Semantics via Adjustive Attention for Multi-Label Text Classification. Ieee Access, 8, 152183–152192. https://doi.org/10.1109/access.2020.3017382
CHOIRULFIKRI, M. R., LHAKSAMANA, K. M., & FARABY, S. AL. 2022. A Multi-Label Classification of Al-Quran Verses Using Ensemble Method and Naïve Bayes. Building of Informatics, Technology and Science (BITS), 3(4), 473–479. https://doi.org/10.47065/bits.v3i4.1287
DEVLIN, J., CHANG, M.-W., LEE, K., & TOUTANOVA, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 1(1), 1–16. https://doi.org/10.48550/arXiv.1810.04805
HUTAMA, L. B., & SUHARTONO, D. 2022. Indonesian Hoax News Classification With Multilingual Transformer Model and BERTopic. Informatica, 46(8). https://doi.org/10.31449/inf.v46i8.4336
KUSTIAWAN, R., ADIWIJAYA, A., & PURBOLAKSONO, M. D. 2022. A Multi-label Classification on Topic of Hadith Verses in Indonesian Translation using CART and Bagging. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(2), 868. https://doi.org/10.30865/mib.v6i2.3787
LI, Q., XIAO, J. Z., & ZHAO, Y. 2023. Research on the Classification of New Energy Industry Policy Texts Based on BERT Model. Sustainability, 15(14), 11186. https://doi.org/10.3390/su151411186
NABIILAH, G. Z., AL FARABY, S., & PURBOLAKSONO, M. D. 2021. Classification of Hadith Topic of Indonesian Translation Using K-Nearest Neighbor and Chi-Square. Intl. Journal on ICT, 7(2), 11–22. https://doi.org/10.34818/ijoict.v7i2.573
NABIILAH, G. Z., ALAM, I. N., PURWANTO, E. S., & HIDAYAT, M. F. 2024. Indonesian Multilabel Classification Using IndoBERT Embedding and MBERT Classification. International Journal of Electrical and Computer Engineering (Ijece), 14(1), 1071. https://doi.org/10.11591/ijece.v14i1.pp1071-1078
NAUFAL, M. A., & GIRSANG, A. S. 2024. Traffic Accident Classification Using IndoBERT. International Journal of Informatics and Communication Technology (Ij-Ict), 13(1), 42. https://doi.org/10.11591/ijict.v13i1.pp42-49
NISSA, N. K., & YULIANTI, E. 2023. Multi-Label Text Classification of Indonesian Customer Reviews Using Bidirectional Encoder Representations From Transformers Language Model. International Journal of Electrical and Computer Engineering (Ijece), 13(5), 5641. https://doi.org/10.11591/ijece.v13i5.pp5641-5652
NOUH, S. E., & ALSAYAT, A. 2020. The Multi-Class Classification for the First Six Surats of the Holy Quran. IJACSA) International Journal of Advanced Computer Science and Applications, 11(1), 327–332. www.ijacsa.thesai.org
NUHA, U., & ROCHMAWATI, N. 2019. Klasifikasi Kesahihan Hadits Berdasarkan Perawi Hadits Menggunakan Principal Component Analysis (PCA) dan Backpropagation Neural Network (BPNN). Journal of Informatics and Computer Science), 01(2).
SYAKHRANI, A. W. 2023. Fungsi, Kedudukandan Perbandingan Hadits Dengan Al-Qur’an. MUSHAF JOURNAL : Jurnal Ilmu Al Quran Dan Hadis, 3(1), 51–58.
TORRES, J. N., MORA, M., HERNÁNDEZ-GARCÍA, R., BARRIENTOS, R. J., FREDES, C., & VALENZUELA, A. 2020. A review of convolutional neural network applied to fruit image processing. Applied Sciences (Switzerland), 10(10). https://doi.org/10.3390/app10103443
WANG, H., TIAN, K., WU, Z., & WANG, L. 2021. A short text classification method based on convolutional neural network and semantic extension. International Journal of Computational Intelligence Systems, 14(1), 367–375. https://doi.org/10.2991/ijcis.d.201207.001
WASFEY, A., ELREFAI, E., MUHAMMAD, M., & NAWAZ, H. 2022. Stars at Qur’an QA 2022: Building Automatic Extractive Question Answering Systems for the Holy Qur’an with Transformer Models and Releasing a New Dataset. Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur’an QA and Fine-Grained Hate Speech Detection, 146–153. https://aclanthology.org/2022.osact-1.18
ZHOU, Y., LI, J., CHI, J., TANG, W., & ZHENG, Y. 2022. Set-CNN: A text convolutional neural network based on semantic extension for short text classification. Knowledge-Based Systems, 257(1), 109948. https://doi.org/10.1016/J.KNOSYS.2022.109948
Unduhan
Diterbitkan
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Jurnal Teknologi Informasi dan Ilmu Komputer

Artikel ini berlisensiCreative Commons Attribution-ShareAlike 4.0 International License.

Artikel ini berlisensi Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Penulis yang menerbitkan di jurnal ini menyetujui ketentuan berikut:
- Penulis menyimpan hak cipta dan memberikan jurnal hak penerbitan pertama naskah secara simultan dengan lisensi di bawah Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) yang mengizinkan orang lain untuk berbagi pekerjaan dengan sebuah pernyataan kepenulisan pekerjaan dan penerbitan awal di jurnal ini.
- Penulis bisa memasukkan ke dalam penyusunan kontraktual tambahan terpisah untuk distribusi non ekslusif versi kaya terbitan jurnal (contoh: mempostingnya ke repositori institusional atau menerbitkannya dalam sebuah buku), dengan pengakuan penerbitan awalnya di jurnal ini.
- Penulis diizinkan dan didorong untuk mem-posting karya mereka online (contoh: di repositori institusional atau di website mereka) sebelum dan selama proses penyerahan, karena dapat mengarahkan ke pertukaran produktif, seperti halnya sitiran yang lebih awal dan lebih hebat dari karya yang diterbitkan. (Lihat Efek Akses Terbuka).












