Penyeimbangan Kelas SMOTE dan Seleksi Fitur Ensemble Filter pada Support Vector Machine untuk Klasifikasi Penyakit Liver

Penulis

  • Muhammad Amir Nugraha Universitas Lambung Mangkurat, Banjarmasin
  • Muhammad Itqan Mazdadi Universitas Lambung Mangkurat, Banjarmasin
  • Andi Farmadi Universitas Lambung Mangkurat, Banjarmasin
  • Muliadi Universitas Lambung Mangkurat, Banjarmasin
  • Triando Hamonangan Saragih Universitas Lambung Mangkurat, Banjarmasin

DOI:

https://doi.org/10.25126/jtiik.1067234

Kata Kunci:

Liver, Klasifikasi, SVM, SMOTE, Ensemble Filter

Abstrak

Liver merupakan salah satu organ penting dalam tubuh manusia yang berperan dalam proses metabolisme tubuh. Mengutip artikel dari situs American Liver Foundation, pada tahun 2020 sebanyak 51.642 orang dewasa di Amerika Serikat meninggal akibat penyakit liver. Data hasil tes fungsi liver dari laboratorium dapat digunakan untuk mendiagnosis penyakit liver. Klasifikasi penyakit liver pada pasien perlu dilakukan dengan baik karena hasilnya dapat membantu dalam diagnosis awal apakah seorang pasien mengidap penyakit liver. Berdasarkan penelitian sebelumnya, metode Support Vector Machine (SVM) paling baik dalam mengklasifikasikan pasien penyakit liver. Namun, SVM memiliki kelemahan ketika diterapkan pada dataset dengan kelas yang tidak seimbang dan tidak bekerja secara akurat ketika terlalu banyak fitur yang tidak relevan digunakan. Untuk menyeimbangkan kelas pada dataset, digunakan metode Synthetic Minority Oversampling Technique (SMOTE). Sedangkan untuk seleksi fitur dilakukan menggunakan metode Ensemble Filter, terdiri dari metode Information Gain, Gain Ratio, dan Relief-F untuk menangani fitur-fitur tidak relevan. Berdasarkan hasil pengujian, penerapan SMOTE dan Ensemble Filter pada metode klasifikasi SVM memberikan hasil terbaik dengan nilai accuracy sebesar 85% dan AUC sebesar 0,850. Pengujian tersebut dapat membuktikan jika SMOTE pada penyeimbangan kelas dan Ensemble Filter pada seleksi fitur dapat meningkatkan performa klasifikasi dari metode SVM.

   Abstract   The liver is one of the important organs in the human body that plays a role in the body's metabolic processes. Quoting an article from the American Liver Foundation website, in 2020, as many as 51,642 adults in the United States died from liver disease. Liver function test data from the laboratory can be used to diagnose liver disease. Classification of liver disease in patients needs to be done well because the results can help in the initial diagnosis of whether a patient has liver disease. Based on previous research, the Support Vector Machine (SVM) method best classifies liver disease patients. However, SVM has weaknesses when applied to datasets with unbalanced classes and does not work accurately when too many irrelevant features are used. To class-balance the dataset, the Synthetic Minority Oversampling Technique (SMOTE) method is used. Meanwhile, feature selection is performed using the Ensemble Filter method, which consists of Information Gain, Gain Ratio, and Relief-F methods to handle irrelevant features. Based on the test results, the application of SMOTE and Ensemble Filter in SVM classification gives the best results with an accuracy value of 85% and an AUC of 0.850. The test can prove if SMOTE on class balancing and Ensemble Filter on feature selection can improve the classification performance of the SVM method.

Downloads

Download data is not yet available.

Referensi

ASSEGIE, T.A., 2021. Support Vector Machine and K-Nearest Neighbor Based Liver Disease Classification Model. Indonesian Journal of Electronics, Electromedical, and Medical Informatics (IJEEEMI), [online] 3(1), pp.9–14. Available at: <http://ijeeemi.poltekkesdepkes-sby.ac.id/index.php/ijeeemi>.

AWALINA, A., BACHTIAR, F.A. & INDRIATI, 2022. Klasifikasi Ulasan Palsu Menggunakan Borderline Over-Sampling (Bos) Dan Support Vector Machine (Svm) (Studi Kasus: Ulasan Tempat Makan) Spam Review Classification Using Borderline Over-Sampling And Support Vector Machine Algorithm. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 9(2), pp.419–426. https://doi.org/10.25126/jtiik.202295692.

BOMMERT, A., SUN, X., BISCHL, B., RAHNENFÜHRER, J. & LANG, M., 2020. Benchmark for Filter Methods for Feature Selection in High-Dimensional Classification Data. Computational Statistics & Data Analysis, [online] 143. https://doi.org/10.1016/j.csda.2019.106839.

CERVANTES, J., GARCIA-LAMONT, F., RODRÍGUEZ-MAZAHUA, L. & LOPEZ, A., 2020. A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing. [online] https://doi.org/10.1016/j.neucom.2019.10.118.

DAI, J. & XU, Q., 2013. Attribute Selection Based on Information Gain Ratio in Fuzzy Rough Set Theory with Application to Tumor Classification. Applied Soft Computing, 13(1), pp.211–221. https://doi.org/10.1016/j.asoc.2012.07.029.

FERDINAND, Y. & AL MAKI, W.F., 2022. Broccoli Leaf Diseases Classification Using Support Vector Machine with Particle Swarm Optimization based on Feature Selection. International Journal of Advances in Intelligent Informatics, 8(3), pp.337–348. https://doi.org/10.26555/ijain.v8i3.951.

HAJAR, N., SETIAWAN, N.Y. & BACHTIAR, F.A., 2022. Pengelompokan Mahasiswa untuk Pengajuan Bantuan Uang Kuliah Tunggal menggunakan Metode K-Means Clustering (Studi Kasus BEM FILKOM UB). Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, [online] 6(5), pp.2353–2361. Available at: <http://j-ptiik.ub.ac.id>.

HAMID, T.M.T.A., SALLEHUDDIN, R., YUNOS, Z.M. & ALI, A., 2021. Ensemble Based Filter Feature Selection with Harmonize Particle Swarm Optimization and Support Vector Machine for Optimal Cancer Classification. Machine Learning with Applications, 5. https://doi.org/10.1016/j.mlwa.2021.100054.

HAN, J., KAMBER, M. & PEI, J., 2011. Data Mining Concepts and Techniques. Third Edition ed. Waltham: Morgan Kaufmann Publisher.

HUANG, B., ZHU, Y., WANG, Z. & FANG, Z., 2021. Imbalanced Data Classification Algorithm Based on Clustering and SVM. Journal of Circuits, Systems and Computers, 30(2). https://doi.org/10.1142/S0218126621500365.

ISHAQ, A., SADIQ, S., UMER, M., ULLAH, S., MIRJALILI, S., RUPAPARA, V. & NAPPI, M., 2021. Improving the Prediction of Heart Failure Patients Survival Using SMOTE and Effective Data Mining Techniques. IEEE Access, 9, pp.39707–39716. https://doi.org/10.1109/ACCESS.2021.3064084.

JOLOUDARI, J.H., SAADATFAR, H., DEHZANGI, A. & SHAMSHIRBAND, S., 2019. Computer-aided Decision-making for Predicting Liver Disease Using PSO-based Optimized SVM with Feature Selection. Informatics in Medicine Unlocked, 17. https://doi.org/10.1016/j.imu.2019.100255.

MAHMUD, M.S., HUANG, J.Z., SALLOUM, S., EMARA, T.Z. & SADATDIYNOV, K., 2020. A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining and Analytics, 3(2), pp.85–101. https://doi.org/10.26599/BDMA.2019.9020015.

MD, A.Q., KULKARNI, S., JOSHUA, C.J., VAICHOLE, T., MOHAN, S. & IWENDI, C., 2023. Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease. Biomedicines, [online] 11. https://doi.org/10.3390/biomedicines11020581.

MERA-GAONA, M., LÓPEZ, D.M., VARGAS-CANAS, R. & NEUMANN, U., 2021. Framework for the Ensemble of Feature Selection Methods. Applied Sciences, 11. https://doi.org/10.3390/app11178122.

MUSYAFFA, N. & RIFAI, B., 2018. Model Support Vector Machine Berbasis Particle Swarm Optimization untuk Prediksi Penyakit Liver. JURNAL ILMU PENGETAHUAN DAN TEKNOLOGI KOMPUTER, 3(2).

PANWAR, V., CHOUDHARY, N., MITTAL, S. & SAHU, G., 2021. Review of Liver Disease Prediction using Machine Learning Algorithm. Journal of Emerging Technologies and Innovative Research (JETIR), [online] 8(2). Available at: .

PUTRI, N.L., NUGROHO, R.A. & HERTENO, R., 2021. Intrusion Detection System Berbasis Seleksi Fitur dengan Kombinasi Filter Information Gain Ratio dan Correlation. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 8(3), pp.457–464. https://doi.org/10.25126/jtiik.202183154.

RAHMAWAN, H. & SN, A., 2020. Penentuan Rekomendasi Pelatihan Pengembangan Diri bagi Pegawai Negeri Sipil Menggunakan Algoritma C4.5 dengan Principal Component Analysis dan Diskritisasi. Jurnal TEKNO KOMPAK, 14(1), pp.5–10.

RAMADHANTI, N.S., KUSUMA, W.A. & ANNISA, 2020. Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 7(6), pp.1221–1230. https://doi.org/10.25126/jtiik.202072857.

SANTOSO, H., PUTRI, R.A. & SAHBANDI, 2023. Deteksi Komentar Cyberbullying pada Media Sosial Instagram Menggunakan Algoritma Random Forest. Jurnal Manajemen Informatika (JAMIKA), 13(1). https://doi.org/10.34010/jamika.v13i1.9303.

SINGH, J., BAGGA, S. & KAUR, R., 2020. Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques. In: Procedia Computer Science. Elsevier B.V. pp.1970–1980. https://doi.org/10.1016/j.procs.2020.03.226.

SUN, Y., QUE, H., CAI, Q., ZHAO, J., LI, J., KONG, Z. & WANG, S., 2022. Borderline SMOTE Algorithm and Feature Selection‐Based Network Anomalies Detection Strategy. Energies, 15. https://doi.org/10.3390/en15134751.

SUYANTO, 2019. Data Mining Untuk Klasifikasi Dan Klasterisasi Data Edisi Revisi. Bandung: Informatika.

TAO, P., SUN, Z. & SUN, Z., 2018. An Improved Intrusion Detection Algorithm Based on GA and SVM. IEEE Access, 6, pp.13624–13631. https://doi.org/10.1109/ACCESS.2018.2810198.

THASEEN, I.S. & KUMAR, C.A., 2017. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University - Computer and Information Sciences, 29(4), pp.462–472. https://doi.org/10.1016/j.jksuci.2015.12.004.

UBAIDILLAH, R., MULIADI, NUGRAHADI, D.T., FAISAL, M.R. & HERTENO, R., 2022. Implementasi XGBoost pada Keseimbangan Liver Patient Dataset dengan SMOTE dan Hyperparameter Tuning Bayesian Search. Jurnal Media Informatika Budidarma, 6(3), pp.1723–1729. https://doi.org/10.30865/mib.v6i3.4146.

URBANOWICZ, R.J., MEEKER, M., LA CAVA, W., OLSON, R.S. & MOORE, J.H., 2018. Relief-based Feature Selection: Introduction and Review. Journal of Biomedical Informatics, 85, pp.189–203. https://doi.org/10.1016/j.jbi.2018.07.014.

WANG, J., XU, J., ZHAO, C., PENG, Y. & WANG, H., 2019. An Ensemble Feature Selection Method for High-dimensional Data Based on Sort Aggregation. Systems Science & Control Engineering, [online] 7(2), pp.32–39. https://doi.org/10.1080/21642583.2019.1620658.

Diterbitkan

30-12-2023

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Penyeimbangan Kelas SMOTE dan Seleksi Fitur Ensemble Filter pada Support Vector Machine untuk Klasifikasi Penyakit Liver. (2023). Jurnal Teknologi Informasi Dan Ilmu Komputer, 10(6), 1273-1284. https://doi.org/10.25126/jtiik.1067234