Analisis Perbandingan Model Machine Learning Tree-Based dan Non-Tree-Based untuk Tugas Klasifikasi
DOI:
https://doi.org/10.25126/jtiik.124Kata Kunci:
Machine Learning, Model Tree-Based, Model Non-Tree-Based, KlasifikasiAbstrak
Penelitian ini membahas perbandingan performa model machine learning berbasis pohon keputusan (Tree-Based) dan non-pohon keputusan (Non-Tree-Based) dalam tugas klasifikasi. Model Tree-based yang diuji meliputi LightGBM, CatBoost, XGBoost, dan Random Forest, sedangkan model Non-tree-based meliputi SVM, KNN, dan GaussianNB. Evaluasi dilakukan pada tiga dataset berbeda, yaitu Spaceship Titanic, Horse Health, dan Keep It Dry. Metrik yang digunakan untuk mengevaluasi performa model adalah AUC-ROC, akurasi, dan F1-score Micro. Hasil penelitian menunjukkan bahwa model berbasis pohon keputusan seperti CatBoost dan LightGBM umumnya memberikan performa yang lebih baik dibandingkan dengan model non-pohon keputusan. CatBoost khususnya menunjukkan hasil terbaik dalam hal akurasi, AUC-ROC, dan F1-score Micro di sebagian besar dataset yang diuji. Selain itu, penelitian ini juga menyoroti pentingnya pemilihan model yang tepat berdasarkan karakteristik dataset yang digunakan. Faktor-faktor seperti kompleksitas data, jumlah fitur, dan distribusi kelas sangat mempengaruhi hasil akhir dari setiap model yang diterapkan. Dengan demikian, temuan ini dapat membantu praktisi machine learning dalam memilih model yang paling sesuai untuk tugas klasifikasi tertentu.
Abstract
This study discusses the performance comparison of tree-based and non-tree-based machine learning models for classification tasks. The Tree-based models tested include LightGBM, CatBoost, XGBoost, and Random Forest, while the Non-tree-based models include SVM, KNN, and GaussianNB. The evaluation was conducted on three different datasets, namely Spaceship Titanic, Horse Health, and Keep It Dry. The metrics used to evaluate model performance are AUC-ROC, accuracy, and F1-score Micro. The results show that tree-based models such as CatBoost and LightGBM generally provide better performance compared to non-tree-based models. CatBoost, in particular, showed the best results in terms of accuracy, AUC-ROC, and F1-score Micro in most of the datasets tested. Additionally, this study highlights the importance of selecting the appropriate model based on the characteristics of the datasets used. Factors such as data complexity, number of features, and class distribution significantly affect the final results of each applied model. Thus, these findings can assist machine learning practitioners in choosing the most suitable model for specific classification tasks.
Downloads
Referensi
ALPAYADIN, E., 2020. Introduction to Machine Learning, Fourth Edition. Cambridge: MIT Press.
AMPOMAH, E.K., QIN, Z. dan NYAME, G., 2020. Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information, 11(6), hal. 332.
CARRINGTON, A.M. dkk., 2022. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), hal. 329–341.
FATYANOSA, T.N. dan BACHTIAR, F.A., 2017. Classification method comparison on Indonesian social media sentiment analysis. 2017 International Conference on Sustainable Information Engineering and Technology, hal. 310–315. Tersedia di: https://doi.org/10.1109/SIET.2017.8304154
GENUER, R. dan POGGI, J.-M., 2020. Random Forests. In: Random Forests with R. Use R!. Cham: Springer International Publishing, hal. 33–55. Tersedia di: https://doi.org/10.1007/978-3-030-56485-8_3.
GRANDINI, M., BANGLI, E. dan VISANI, G., 2020. Metrics for Multi-Class Classification: an Overview. arXiv. Tersedia di: http://arxiv.org/abs/2008.05756 [Diakses 22 Jul. 2024].
HASSANALI, M. dkk., 2024. Software development effort estimation using boosting algorithms and automatic tuning of hyperparameters with Optuna. Journal of Software: Evolution and Process, hal. e2665. Tersedia di: https://doi.org/10.1002/smr.2665.
IBRAHIM, A.A. dkk., 2020. Comparison of the CatBoost classifier with other machine learning methods. International Journal of Advanced Computer Science and Applications, 11(11). Tersedia di: https://pdfs.semanticscholar.org/948c/ae886bd76ab222f3be431ab0a71e6aa03286.pdf [Diakses 22 Jul. 2024].
JAYATILAKE, S.M.D.A.C. dan GANEGODA, G.U., 2021. Involvement of Machine Learning Tools in Healthcare Decision Making. Journal of Healthcare Engineering. Disunting oleh M. MARTORELLI, 2021, hal. 1–20. Tersedia di: https://doi.org/10.1155/2021/6679512.
JIANG, M. dkk., 2020. An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Physica A: Statistical Mechanics and its Applications, 541, hal. 122272.
MIAO, J. dan ZHU, W., 2022. Precision–recall curve (PRC) classification trees. Evolutionary Intelligence, 15(3), hal. 1545–1569. Tersedia di: https://doi.org/10.1007/s12065-021-00565-2.
NTI, I.K., NYARKO-BOATENG, O. dan ANING, J., 2021. Performance of machine learning algorithms with different K values in K-fold CrossValidation. International Journal of Information Technology and Computer Science, 13(6), hal. 61–71.
PISNER, D.A. dan SCHNYER, D.M., 2020. Support vector machine’, in Machine learning. Elsevier, hal. 101–121. Tersedia di: https://www.sciencedirect.com/science/article/pii/B9780128157398000067 [Diakses 22 Jul. 2024].
PONSAM, J.G. dkk., 2021. Credit Risk Analysis using LightGBM and a comparative study of popular algorithms. 2021 4th International Conference on Computing and Communications Technologies (ICCCT), hal. 634–641. Tersedia di: https://ieeexplore.ieee.org/abstract/document/9711896/ [Diakses 22 Jul. 2024].
REDDY, E.M.K. dkk., 2022. Introduction to Naive Bayes and a review on its subtypes with applications. Bayesian reasoning and gaussian processes for machine learning applications, hal. 1–14.SABRY, F., 2023. K Nearest Neighbor Algorithm: Fundamentals and Applications. One Billion Knowledgeable.
VUJOVIĆ, Ž., 2021. Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 12(6), hal. 599–606.
WADE, C. dan GLYNN, K., 2020. Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python. Packt Publishing Ltd. Tersedia di: https://books.google.com/books?hl=id&lr=&id=2tcDEAAAQBAJ&oi=fnd&pg=PP1&dq=what+is+xgboost&ots=s5sLInmmmO&sig=UJAptEfWVJZ3QRGVkfXUq7xUov8 [Diakses 22 Jul. 2024].
WHIG, P., GUPTA, K. dan JIWANI, N., 2022. Real-Time Detection of Cardiac Arrest Using Deep Learning. AI-Enabled Multiple-Criteria Decision-Making Approaches for Healthcare Management. IGI Global, hal. 1–25. Tersedia di: https://www.igi-global.com/chapter/real-time-detection-of-cardiac-arrest-using-deep-learning/312326 [Diakses 22 Jul. 2024].
Unduhan
Diterbitkan
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Jurnal Teknologi Informasi dan Ilmu Komputer

Artikel ini berlisensiCreative Commons Attribution-ShareAlike 4.0 International License.

Artikel ini berlisensi Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Penulis yang menerbitkan di jurnal ini menyetujui ketentuan berikut:
- Penulis menyimpan hak cipta dan memberikan jurnal hak penerbitan pertama naskah secara simultan dengan lisensi di bawah Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) yang mengizinkan orang lain untuk berbagi pekerjaan dengan sebuah pernyataan kepenulisan pekerjaan dan penerbitan awal di jurnal ini.
- Penulis bisa memasukkan ke dalam penyusunan kontraktual tambahan terpisah untuk distribusi non ekslusif versi kaya terbitan jurnal (contoh: mempostingnya ke repositori institusional atau menerbitkannya dalam sebuah buku), dengan pengakuan penerbitan awalnya di jurnal ini.
- Penulis diizinkan dan didorong untuk mem-posting karya mereka online (contoh: di repositori institusional atau di website mereka) sebelum dan selama proses penyerahan, karena dapat mengarahkan ke pertukaran produktif, seperti halnya sitiran yang lebih awal dan lebih hebat dari karya yang diterbitkan. (Lihat Efek Akses Terbuka).