Joint Distribution pada Weighted Majority Vote (WMV) untuk Peningkatan Kinerja Sentiment Analysis Tersupervisi pada Dataset Twitter

Penulis

  • Bagus Setya Rintyarna Universitas Muhammadiyah Jember, Jember

DOI:

https://doi.org/10.25126/jtiik.2022956185

Abstrak

Sentiment analysis adalah teknik komputasi text mining berbasis natural language processing (NLP) untuk mengekstraksi pendapat seseorang yang diungkapkan dalam platform online, termasuk dalam platform microblogging Twitter, salah satu platform microblogging yang paling popular digunakan di Indonesia. Ada dua pendekatan yang umum digunakan dalam teknik sentiment analysis yaitu pendekatan berbasis machine learning (ML) dan pendekatan berbasis sentiment lexicon (SL). Fokus penelitian ini adalah untuk pengembangan teknik sentiment analysis berbasis machine learning yang disebut juga teknik tersupervisi pada dataset Twitter. Sebagian besar sentiment analysis pada dataset Twitter berbahasa Indonesia mengandalkan single machine learning algorithm. Penelitian ini menggabungkan kinerja berbagai algoritma/experts seraya mengurangi tingkat kesalahan klasifikasi dengan meng-update bobot secara dinamis menggunakan weighted majority vote (WMV) berbasis joint distribution dari Bayesian Network. Pada tahap pertama, data di grabbing dari Twitter dengan 3 hashtag terkait Covid-19 sebagai data eksperimen. Selanjutnya kinerja weighted majority vote secara ekstensif dibandingkan dengan 4 metode baseline sebagai pembanding, yaitu: Naïve Bayes, Gaussian Naïve Bayes, Multinomial Naïve Bayes dan Majority Vote dari ketiga single classifier tersebut. Metrics kinerja yang digunakan adalah precision, recall, fmeasure, accuracy dan Mathews correlation coeficient (MCCC). Dalam eksperimen, terbukti bahwa WMV mampu meningkatkan kinerja sentiment analysis pada ketiga topik dataset dengan evaluator berbagai metrics kinerja sentiment analysis.

 

Abstract

Sentiment analysis is a computational text mining technique based on natural language processing (NLP) to extract someone's opinion expressed in online platforms, including the Twitter microblogging platform, one of the most popular microblogging platforms used in Indonesia. There are two approaches that are commonly used in sentiment analysis techniques, namely the machine learning (ML) based approach and the sentiment lexicon (SL) based approach. The focus of this research is the development of machine learning-based sentiment analysis techniques which are also called supervised techniques on the Twitter dataset. Most of the sentiment analysis on the Indonesian language Twitter dataset relies on a single machine learning algorithm. This study combines the performance of various algorithms/experts while reducing the level of misclassification by updating the weights dynamically using a joint distribution-based weighted majority vote (WMV) from the Bayesian Network. In the first stage, data was grabbed from Twitter with 3 hashtags related to Covid-19 as experimental data. Furthermore, the performance of the weighted majority vote was extensively compared with 4 baseline methods for comparison, namely: Naïve Bayes, Gaussian Naïve Bayes, Multinomial Nave Bayes and Majority Vote from the three single classifiers. Performance metrics used are precision, recall, fmeasure, accuracy and Mathews correlation coeficient. In experiments, it is proven that WMV is able to improve sentiment analysis performance on the three dataset topics with various evaluators of sentiment analysis performance metrics.


Downloads

Download data is not yet available.

Biografi Penulis

  • Bagus Setya Rintyarna, Universitas Muhammadiyah Jember, Jember

    Google Scholar :

    https://scholar.google.co.id/citations?user=MN4TULAAAAAJ&hl=id

    ID SCOPUS : 57191611739

    ID SINTA : 5973952 

Referensi

ALIYANTO, D., SARNO, R. & RINTYARNA, B. S., 2017. Supervised Probabilistic Latent Semantic Analysis ( sPLSA ) for Estimating Technology Readiness Level: International Conference on Information & Communication Technology and System, pp. 79–84.

AYU WULANDARI, D. dkk, 2021. Analisis Sentimen Media Sosial Twitter Terhadap Reaksi Masyarakat Pada Ruu Cipta Kerja Menggunakan Metode Klasifikasi Algoritma Naive Bayes Analysis of Twitter Social Media Sentiment: The Public’S Reaction To the Drafts of Job Creation Law Using the Cla’, 8(5), pp. 9007–9016.

AZIZ, R. H. H. & DIMILILER, N, 2020. Twitter Sentiment Analysis using an Ensemble Weighted Majority Vote Classifier: 3rd International Conference on Advanced Science and Engineering, ICOASE 2020, pp. 103–109. doi: 10.1109/ICOASE51841.2020.9436590.

BASHIR, A., KHAN, L. & AWAD, M., 2011. Bayesian Networks: Encyclopedia of Data Warehousing and Mining, pp. 89–93. doi: 10.4018/978-1-59140-557-3.ch018.

CHICCO, D. & JURMAN, G, 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation: BMC Genomics, 21(1), pp. 1–13. doi: 10.1186/s12864-019-6413-7.

DEMIRCAN, M., dkk., 2021. Developing Turkish Sentiment Analysis Models Using Machine Learning and E-Commerce Data: International Journal of Cognitive Computing in Engineering. doi: 10.1016/j.ijcce.2021.11.003.

HECKERMAN, D. 2008. A tutorial on learning with Bayesian networks: Studies in Computational Intelligence, 156(November 1996), pp. 33–82. doi: 10.1007/978-3-540-85066-3_3.

MAILO, F. F. & LAZUARDI, L. 2019. Analisis Sentimen Data Twitter Menggunakan Metode Text Mining Tentang Masalah Obesitas di Indonesia: Journal of Information Systems for Public Health, 4(1), pp. 28–36.

PINTOKO, B. M. & L., K. M. 2018. Analisis Sentimen Jasa Transportasi Online pada Twitter Menggunakan Metode Naive Bayes Classifier: e-Proceeding of Engineering, 5(3), pp. 8121–8130.

PRATAMA, S. F., ANDREAN, R. & NUGROHO, A. 2019. Analisis Sentimen Twitter Debat Calon Presiden Indonesia Menggunakan Metode Fined-Grained Sentiment Analysis: JOINTECS (Journal of Information Technology and Computer Science), 4(2), p. 39. doi: 10.31328/jointecs.v4i2.1004.

RINTYARNA, B. S. 2017. Sentiment Analysis pada Data Twitter dengan Pendekatan Naïve Bayes Multinomial: Jurnal Sistem & Teknologi Informasi Indonesia, pp. 1–6.

RINTYARNA, B. S. 2021. Mapping acceptance of Indonesian organic food consumption under Covid-19 pandemic using Sentiment Analysis of Twitter dataset: Journal of Theoretical and Applied Information Technology, 99(5), pp. 1009–1019.

RINTYARNA, B. S. dkk. 2022. Modelling Service Quality of Internet Service Providers during COVID-19 : The Customer Perspective Based on Twitter Dataset’, pp. 1–12.

RINTYARNA, B. S., SARNO, R. & FATICHAH, C. 2018. Enhancing the performance of sentiment analysis task on product reviews by handling both local and global context: International Journal of Information and Decision Science, 11(xxxx).

RINTYARNA, B. S., SARNO, R. & FATICHAH, C. 2019. Semantic Features for Optimizing Supervised Approach of Sentiment Analysis on Product Reviews: MDPI Computers, 8(3), pp. 1–16.

RINTYARNA, B. S., SARNO, R. & FATICHAH, C. 2020. Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks: Journal of Big Data, 6(1). doi: 10.1186/s40537-019-0246-8.

RUTA, D. & GABRYS, B. 2005. Classifier selection for majority voting’, Information Fusion, 6(1), pp. 63–81. doi: 10.1016/j.inffus.2004.04.008.

SANTOSO, G. T. 2021. Analisis sentimen pada tweet dengan tagar #bpjsrasarentenir menggunakan metode support vectore machine (svm) skripsi’.

UTAMI, D. S. & ERFINA, A. 2021. Analisis Sentimen Pinjaman Online di Twitter Menggunakan Algoritma Support Vector Machine (SVM)’, SISMATIK (Seminar Nasional Sistem Informasi dan Manajemen Informatika), 1(1), pp. 299–305.

Diterbitkan

31-10-2022

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Joint Distribution pada Weighted Majority Vote (WMV) untuk Peningkatan Kinerja Sentiment Analysis Tersupervisi pada Dataset Twitter. (2022). Jurnal Teknologi Informasi Dan Ilmu Komputer, 9(5), 1083-1090. https://doi.org/10.25126/jtiik.2022956185