Perbandingan Pretrained Model Transformer pada Deteksi Ulasan Palsu

Penulis

  • Aisyah Awalina Universitas Brawijaya, Malang
  • Fitra Abdurrachman Bachtiar Universitas Brawijaya, Malang
  • Fitri Utaminingrum Universitas Brawijaya, Malang

DOI:

https://doi.org/10.25126/jtiik.2022935696

Abstrak

Kemudahan untuk memperoleh informasi saat ini, telah sedikit membantu hidup kita. Seperti mencari ulasan untuk menimbang tempat atau barang yang akan dipilih. Beberapa orang memanfaatkan hal tersebut dengan membuat ulasan palsu untuk kepentingan mereka sendiri. Sehingga deteksi ulasan palsu sangat dibutuhkan. Model Transformer saat ini banyak diterapkan pada pemrosesan bahasa alami karena kinerja yang diperoleh nya sangat baik. Ada dua pendekatan yang dapat dilakukan dalam model Transformer yaitu pre-training dan fine-tuning. Penelitian sebelumnya telah banyak menggunakan fine-tuning dari model Transformer dikarenakan adanya kemudahan dalam pelatihan, waktu yang lebih sedikit, biaya dan kebutuhan lingkungan yang lebih rendah dibanding proses pre-training. Akan tetapi penelitian sebelumnya masih sedikit yang membandingkan model deep learning dengan fine-tuning yang khusus diterapkan pada deteksi ulasan palsu. Penelitian ini melakukan perbandingan model Transformer menggunakan pendekatan fine-tuning dengan metode deep learning yaitu CNN dengan berbagai pretrained word embedding untuk mengatasi deteksi ulasan palsu pada dataset Ott. Model RoBERTa mengungguli model Transformer dan deep learning dimana nilai akurasi 90,8%; precision 90%; recall 91,8% dan f1-score 90,8%. Namun dari segi waktu komputasi model pelatihan, DistilBERT memperoleh waktu komputasi terkecil yaitu dengan nilai 200,5 detik. Meskipun begitu, hasil yang diperoleh model Transformer maupun deep learning memiliki kinerja yang baik untuk deteksi ulasan palsu pada dataset Ott.


Abstract


The ease of obtaining information today has helped our lives, like looking for reviews to weigh the place or item to choose. Some people take advantage of this by creating spam reviews for their benefit. So the detection of spam reviews is needed. Transformer models are currently widely applied to natural language processing because they have outstanding performance. Two approaches in the Transformer model is pre-training and fine-tuning. Previous studies have used a lot of fine-tuning due to the ease of training, less time, costs, and lower environmental requirements than the pre-training process. However, a few previous studies compare deep learning models with fine-tuning applied explicitly for detecting spam reviews. This study compares the Transformer model using a fine-tuning approach with a deep learning method, namely CNN, which uses various pre-trained word embedding to overcome the detection of false reviews in the Ott dataset. The result is RoBERTa model outperforms between Transformer and deep learning models, where the accuracy is 90.8%, precision is 90%, recall is 91.8%, and f1-score is 90.8%. Afterward, DistilBERT models obtained the shortest computation time with 200.5 seconds. However, the results obtained by both Transformer and deep learning models perform well to detect spam reviews in the Ott dataset.


Downloads

Download data is not yet available.

Referensi

BAHY HAKIM, H., UTAMININGRUM, F., & SETIA BUDI, A., 2021. Early Detection of COVID-19 Patient’s Survavibility Based On The Image Of Lung X-Ray Image Using Deep Neural Networks. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(3).

https://doi.org/10.22219/kinetik.v6i3.1265.

DEVLIN, J., CHANG, M.W., LEE, K., & TOUTANOVA, K., 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR, [online] 1, pp.4171–4186. Available at: <http://arxiv.org/abs/1810.04805>.

HAJEK, P., BARUSHKA, A., & MUNK, M., 2020. Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Computing and Applications, [online] 32(23), pp.17259–17274. https://doi.org/10.1007/s00521-020-04757-2.

HE, S., HOLLENBECK, B., & PROSERPIO, D., 2021. The Market for Fake Reviews. SSRN Electronic Journal.

HUGGINGFACE, 2021. Transformer models - Hugging Face Course. [online] Available at: <https://huggingface.co/course/chapter1/4?fw=pt> [Accessed 4 Aug. 2021].

KENNEDY, S., WALSH, N., SLOKA, K., MCCARREN, A., & FOSTER, J., 2019. Fact or factitious? Contextualized opinion spam detection. In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop. pp.344–350.

KIM, M., LEE, S.M., CHOI, S., & KIM, S.Y., 2021. Impact of visual information on online consumer review behavior: Evidence from a hotel booking website. Journal of Retailing and Consumer Services, [online] 60, p.102494. https://doi.org/10.1016/j.jretconser.2021.102494.

KIM, Y., 2014. Convolutional neural networks for sentence classification. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp.1746–1751. https://doi.org/10.3115/v1/d14-1181.

LAN, Z., CHEN, M., GOODMAN, S., GIMPEL, K., SHARMA, P., & SORICUT, R., 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. [online] pp.1–17. Available at: <http://arxiv.org/abs/1909.11942>.

LE, Q., & MIKOLOV, T., 2014. Distributed representations of sentences and documents. 31st International Conference on Machine Learning, ICML 2014, 4, pp.2931–2939.

MASLEJ-KREŠŇÁKOVÁ, V., SARNOVSKÝ, M., BUTKA, P., & MACHOVÁ, K., 2020. Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification. Applied Sciences (Switzerland), 10(23), pp.1–26. https://doi.org/10.3390/app10238631.

MIKOLOV, T., GRAVE, E., BOJANOWSKI, P., PUHRSCH, C., & JOULIN, A., 2017. Advances in pre-training distributed word representations.

MIKOLOV, T., SUTSKEVER, I., CHEN, K., CORRADO, G., & DEAN, J., 2013. Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2. [online] Lake Tahoe, Nevada: Curran Associates Inc.pp.3111–3119. Available at: <http://arxiv.org/abs/1310.4546>.

MUKHERJEE, A., VENKATARAMAN, V., LIU, B., & GLANCE, N., 2011. What yelp fake review filter might be doing? Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, [online] pp.409–418. Available at: <https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/viewPaper/6006>.

MURPHY, R., 2020. Local Consumer Review Survey: How Customer Reviews Affect Behavior. [online] Available at: <https://www.brightlocal.com/research/local-consumer-review-survey/> [Accessed 4 Aug. 2021].

OTT, M., CARDIE, C., & HANCOCK, J.T., 2013. Negative deceptive opinion spam. In: NAACL HLT 2013 - 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Atlanta, Georgia.pp.497–501.

OTT, M., CHOI, Y., CARDIE, C., & HANCOCK, J.T., 2011. Finding deceptive opinion spam by any stretch of the imagination. In: ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon.pp.309–319.

PENNINGTON, J., SOCHER, R., & MANNING, C., 2014. GloVe: Global Vectors for Word Representation. In: EMNLP. [online] Doha, Qatar: Association for Computational Linguistics.pp.1532–1543. https://doi.org/10.3115/v1/D14-1162.

PUTERI, R.T., & UTAMININGRUM, F., 2020. Micro-sleep detection using combination of haar cascade and convolutional neural network. ACM International Conference Proceeding Series, pp.130–135. https://doi.org/10.1145/3427423.3427433.

RAYANA, S., & AKOGLU, L., 2015. Collective opinion spam detection: Bridging review networks and metadata. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2783258.2783370.

SANH, V., DEBUT, L., CHAUMOND, J., & WOLF, T., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. [online] Available at: <http://arxiv.org/abs/1910.01108>.

YAMADA, I., SHINDO, H., TAKEDA, H., & TAKEFUJI, Y., 2016. Joint learning of the embedding of words and entities for named entity disambiguation. CoRR, pp.250–259. https://doi.org/10.18653/v1/k16-1025.

YANG, Z., DAI, Z., YANG, Y., CARBONELL, J., SALAKHUTDINOV, R., & LE, Q. V., 2019. XLNet: Generalized autoregressive pretraining for language understanding. [online] Available at: <http://arxiv.org/abs/1906.08237>.

ZHANG, W., DU, Y., YOSHIDA, T., & WANG, Q., 2018. DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Information Processing and Management, 54, pp.576–592. https://doi.org/10.1016/j.ipm.2018.03.007.

Diterbitkan

20-06-2022

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Perbandingan Pretrained Model Transformer pada Deteksi Ulasan Palsu. (2022). Jurnal Teknologi Informasi Dan Ilmu Komputer, 9(3), 597-604. https://doi.org/10.25126/jtiik.2022935696