Perbandingan Aplikasi Algoritma Kernel K-Means pada Graf Bipartit dan K-Means pada Matriks Dokumen- Istilah dalam Dataset Penelitian Covid-19 RISTEKBRIN

Penulis

  • Budi Nugroho Pusat Penelitian Informatika - LIPI

DOI:

https://doi.org/10.25126/jtiik.2021824365

Abstrak

Merebaknya kasus Covid-19 di Indonesia telah memunculkan berbagai macam topik penelitian yang dilakukan oleh para peneliti di berbagai bidang dan dari bermacam institusi. Berdasarkan data yang dihimpun oleh portal Sinta Ristekbrin, terdapat 351 topik penelitian yang telah diunggah oleh para peneliti. Kajian ini dimaksudkan untuk menganalisis dan memetakan topik penelitian yang  sedang dan/atau  telah dilakukan selama kurun waktu terjadinya pandemi  Covid-19 di tanah air. Analisis dan pemetaan dilakukan dengan menerapkan algoritma kernel k-means untuk klastering dokumen berbasis graf bipartit dan k-means pada matriks dokumen-istilah. Dataset penelitian Covid-19 Ristekbrin dimodelkan sebagai graf bipartit antara daftar istilah dengan dokumennya. Selanjutnya skor kemiripan dihitung dengan metode kernel. Nilai matriks kernel yang mencerminkan skor kemiripan antar dokumen digunakan sebagai masukan bagi algoritma klastering kernel k-means yang memberikan hasil berupa pemetaan topik penelitian. Sebagai pembanding, algoritma k-means diterapkan pada matriks dokumen-istilah untuk klastering topik penelitian Covid-19. Dari kedua metode tersebut, hasil klastering diuji dengan validasi internal menggunakan indeks Dunn. Indeks Dunn digunakan karena dalam dataset tidak tersedia informasi awal mengenai label atau nama dari masing-masing klaster. Hasil penelitian ini menunjukkan bahwa algoritma  kernel k-means memberikan validasi yang sedikit lebih baik dibandingkan dengan k-means. Hasil kajian ini diharapkan dapat memberikan tambahan informasi yang mendukung program pemerintah dalam mempercepat penanganan Covid-19 di Indonesia.

 

Abstract

The outbreak concerning  the Covid-19 case in Indonesia has raised various topics of research carried out by researchers in diverse fields and from many institutions. Based on data compiled by the Sinta Ristekbrin portal, 351 research topics have been uploaded by researchers. This study is aimed to analyze and map research topics that are being and/or have been conducted during the period of the Covid-19 pandemic in Indonesia. Analysis and mapping are accomplished by applying the kernel k-means algorithm for document clustering based on bipartite graphs and k-means on document term matrix. Ristekbrin's Covid-19 research dataset is modeled as a bipartite graph between terms and documents. Furthermore, the similarity score is calculated using the kernel method. The kernel matrix value that represents the similarity score between documents is used as input for the kernel k-means clustering algorithm, which provides the results of mapping research topics. As comparison, we applied original k-means algorithm on the document-term matrix of the dataset. From these two methods, the clustering results were validated using Dunn index as an internal validation. The Dunn index was used because the dataset did not provide initial information regarding the label or name of each clusters..The comparison Dunn index shows that the kernel k-means algorithm outperforms than the k-means algorithm. This study is expected to provide additional information that supports government programs in accelerating the handling of Covid-19 in Indonesia..


Downloads

Download data is not yet available.

Referensi

ASYARY, A. & VERUSWATI, M., 2020. Sunlight exposure increased Covid-19 recovery rates: A study in the central pandemic area of Indonesia. Science of The Total Environment, [online] 729, p.139016. Available at: <https://doi.org/10.1016/j.scitotenv.2020.139016>.

BROCK, G., PIHUR, V., DATTA, S. & DATTA, S., 2008. clValid : An R Package for Cluster Validation. Journal of Statistical Software, [online] 25(4), pp.1–32. Available at: <http://www.jstatsoft.org/v25/i04/>.

CHITTA, R., JIN, R., HAVENS, T.C. & JAIN, A.K., 2011. Approximate Kernel K-Means: Solution to Large Scale Kernel Clustering. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11. [online] New York, NY, USA: Association for Computing Machinery.pp.895–903. Available at: <https://doi.org/10.1145/2020408.2020558>.

HARDI, W., KUSUMA, W.A. & BASUKI, S., 2019. Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016. Berkala Ilmu Perpustakaan dan Informasi, 15(2), p.226.

HARIYADI, D., 2020. Investigasi Dini Contact Tracing Pasien Menggunakan Pendekatan Standarisasi Forensik Digital. Jurnal Repositor UMM, 2(5), pp.583–590.

HEIMERL, F., LOHMANN, S., LANGE, S. & ERTL, T., 2014. Word Cloud Explorer: Text Analytics Based on Word Clouds. In: 2014 47th Hawaii International Conference on System Sciences. pp.1833–1842.

IRAWAN, D.E., PANGARSO, A., RIDLO, I.A. & FUAD, A., 2020. Telaah bibliometrik pola penyebaran pengetahuan tentang COVID-19 di dunia. Jurnal Matematika dan Sains Institut Teknologi Bandung, [online] (April), pp.1–15. Available at: <https://figshare.com/articles/The_rapid_publications_on_COVID_An_open_science_perspective/12084339/3>.

JIN, C. AND BAI, Q., 2016. Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence. In: 2016 International Conference on Information System and Artificial Intelligence (ISAI). pp.497–502.

LIU, Z. & BARAHONA, M., 2020. Graph-based data clustering via multiscale community detection. Applied Network Science, [online] 5(1), p.3. Available at: <https://doi.org/10.1007/s41109-019-0248-7>.

MEI, Q., CAI, D., ZHANG, D. & ZHAI, C., 2008. Topic Modeling with Network Regularization. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08. [online] New York, NY, USA: Association for Computing Machinery.pp.101–110. Available at: <https://doi.org/10.1145/1367497.1367512>.

NUGROHO, B., ARITSUGI, M., OTACHI, Y. & MANABE, Y., 2019. Combined graph kernels for automatic patent classification: A hybrid approach. World Patent Information, [online] 57, pp.18–24. Available at: <http://www.sciencedirect.com/science/article/pii/S0172219018300577>.

NURAINI, N., KHAIRUDIN, K. & APRI, M., 2020. Modeling Simulation of COVID-19 in Indonesia based on Early Endemic Data. Communication in Biomathematical Sciences, 3(1), pp.1–8.

SUKMANA, M., AMINUDDIN, M. & NOPRIYANTO, D., 2020. Indonesian government response in COVID-19 disaster prevention. East Afrian Scholars Journal of Medical Sciences, [online] 3(3), pp.81–6. Available at: <https://www.easpublisher.com/easjms>.

Swastika, W., Studi, P., Informatika, T. & Korespondensi, P., 2020. Studi Awal Deteksi Covid-19 Menggunakan Citra Ct Berbasis Deep Preliminary Study of Covid-19 Detection Using Ct Image Based on. 7(3), pp.629–634.

TELAUMBANUA, D., 2020. Urgensi Pembentukan Aturan Terkait Pencegahan Covid-19 di Indonesia. QALAMUNA: Jurnal Pendidikan, Sosial, dan Agama, 12(01), pp.59–70.

TOSEPU, R., GUNAWAN, J., EFFENDY, D.S., AHMAD, L.O.A.I., Lestari, H., Bahar, H. and Asfian, P., 2020. Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia. Science of the Total Environment, 725.

WAJDI, M.B.N., IWAN KUSWANDI, UMAR AL FARUQ, ZULHIJRA, Z., KHAIRUDIN, K. AND KHOIRIYAH, K., 2020. Education Policy Overcome Coronavirus, A Study of Indonesians. EDUTEC : Journal of Education And Technology, 3(2), pp.96–106.

ZEIN, A., 2020. Pendeteksian VIirus Corona Dalam Gambar X-Ray Menggunakan Algoritma Artifical Intelligence Dengan Deep Learning Python. jurnal Teknologi Informasi ESIT, XV(01), pp.19–23.

Diterbitkan

25-03-2021

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Perbandingan Aplikasi Algoritma Kernel K-Means pada Graf Bipartit dan K-Means pada Matriks Dokumen- Istilah dalam Dataset Penelitian Covid-19 RISTEKBRIN. (2021). Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(2), 411-418. https://doi.org/10.25126/jtiik.2021824365