Analisis Perbandingan Model Machine Learning Tree-Based dan Non-Tree-Based untuk Tugas Klasifikasi

Fadhilah Hilmi; Kenzie  Taqiyassar; Naufal Romero Putra  Pratama; Satrio Condro Kusuma; Hafiz Rizky Nurwachid; Tirana Noor Fatyanosa

doi:10.25126/jtiik.124

Penulis

Fadhilah Hilmi Universitas Brawijaya, Malang
Kenzie Taqiyassar Universitas Brawijaya, Malang
Naufal Romero Putra Pratama Universitas Brawijaya, Malang
Satrio Condro Kusuma Universitas Brawijaya, Malang
Hafiz Rizky Nurwachid Universitas Brawijaya, Malang
Tirana Noor Fatyanosa Universitas Brawijaya, Malang

DOI:

https://doi.org/10.25126/jtiik.124

Kata Kunci:

Machine Learning, Model Tree-Based, Model Non-Tree-Based, Klasifikasi

Abstrak

Penelitian ini membahas perbandingan performa model machine learning berbasis pohon keputusan (Tree-Based) dan non-pohon keputusan (Non-Tree-Based) dalam tugas klasifikasi. Model Tree-based yang diuji meliputi LightGBM, CatBoost, XGBoost, dan Random Forest, sedangkan model Non-tree-based meliputi SVM, KNN, dan GaussianNB. Evaluasi dilakukan pada tiga dataset berbeda, yaitu Spaceship Titanic, Horse Health, dan Keep It Dry. Metrik yang digunakan untuk mengevaluasi performa model adalah AUC-ROC, akurasi, dan F1-score Micro. Hasil penelitian menunjukkan bahwa model berbasis pohon keputusan seperti CatBoost dan LightGBM umumnya memberikan performa yang lebih baik dibandingkan dengan model non-pohon keputusan. CatBoost khususnya menunjukkan hasil terbaik dalam hal akurasi, AUC-ROC, dan F1-score Micro di sebagian besar dataset yang diuji. Selain itu, penelitian ini juga menyoroti pentingnya pemilihan model yang tepat berdasarkan karakteristik dataset yang digunakan. Faktor-faktor seperti kompleksitas data, jumlah fitur, dan distribusi kelas sangat mempengaruhi hasil akhir dari setiap model yang diterapkan. Dengan demikian, temuan ini dapat membantu praktisi machine learning dalam memilih model yang paling sesuai untuk tugas klasifikasi tertentu.

Abstract

This study discusses the performance comparison of tree-based and non-tree-based machine learning models for classification tasks. The Tree-based models tested include LightGBM, CatBoost, XGBoost, and Random Forest, while the Non-tree-based models include SVM, KNN, and GaussianNB. The evaluation was conducted on three different datasets, namely Spaceship Titanic, Horse Health, and Keep It Dry. The metrics used to evaluate model performance are AUC-ROC, accuracy, and F1-score Micro. The results show that tree-based models such as CatBoost and LightGBM generally provide better performance compared to non-tree-based models. CatBoost, in particular, showed the best results in terms of accuracy, AUC-ROC, and F1-score Micro in most of the datasets tested. Additionally, this study highlights the importance of selecting the appropriate model based on the characteristics of the datasets used. Factors such as data complexity, number of features, and class distribution significantly affect the final results of each applied model. Thus, these findings can assist machine learning practitioners in choosing the most suitable model for specific classification tasks.

Downloads

Download data is not yet available.

Referensi

ALPAYADIN, E., 2020. Introduction to Machine Learning, Fourth Edition. Cambridge: MIT Press.

AMPOMAH, E.K., QIN, Z. dan NYAME, G., 2020. Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement. Information, 11(6), hal. 332.

CARRINGTON, A.M. dkk., 2022. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), hal. 329–341.

FATYANOSA, T.N. dan BACHTIAR, F.A., 2017. Classification method comparison on Indonesian social media sentiment analysis. 2017 International Conference on Sustainable Information Engineering and Technology, hal. 310–315. Tersedia di: https://doi.org/10.1109/SIET.2017.8304154

GENUER, R. dan POGGI, J.-M., 2020. Random Forests. In: Random Forests with R. Use R!. Cham: Springer International Publishing, hal. 33–55. Tersedia di: https://doi.org/10.1007/978-3-030-56485-8_3.

GRANDINI, M., BANGLI, E. dan VISANI, G., 2020. Metrics for Multi-Class Classification: an Overview. arXiv. Tersedia di: http://arxiv.org/abs/2008.05756 [Diakses 22 Jul. 2024].

HASSANALI, M. dkk., 2024. Software development effort estimation using boosting algorithms and automatic tuning of hyperparameters with Optuna. Journal of Software: Evolution and Process, hal. e2665. Tersedia di: https://doi.org/10.1002/smr.2665.

IBRAHIM, A.A. dkk., 2020. Comparison of the CatBoost classifier with other machine learning methods. International Journal of Advanced Computer Science and Applications, 11(11). Tersedia di: https://pdfs.semanticscholar.org/948c/ae886bd76ab222f3be431ab0a71e6aa03286.pdf [Diakses 22 Jul. 2024].

JAYATILAKE, S.M.D.A.C. dan GANEGODA, G.U., 2021. Involvement of Machine Learning Tools in Healthcare Decision Making. Journal of Healthcare Engineering. Disunting oleh M. MARTORELLI, 2021, hal. 1–20. Tersedia di: https://doi.org/10.1155/2021/6679512.

JIANG, M. dkk., 2020. An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms. Physica A: Statistical Mechanics and its Applications, 541, hal. 122272.

MIAO, J. dan ZHU, W., 2022. Precision–recall curve (PRC) classification trees. Evolutionary Intelligence, 15(3), hal. 1545–1569. Tersedia di: https://doi.org/10.1007/s12065-021-00565-2.

NTI, I.K., NYARKO-BOATENG, O. dan ANING, J., 2021. Performance of machine learning algorithms with different K values in K-fold CrossValidation. International Journal of Information Technology and Computer Science, 13(6), hal. 61–71.

PISNER, D.A. dan SCHNYER, D.M., 2020. Support vector machine’, in Machine learning. Elsevier, hal. 101–121. Tersedia di: https://www.sciencedirect.com/science/article/pii/B9780128157398000067 [Diakses 22 Jul. 2024].

PONSAM, J.G. dkk., 2021. Credit Risk Analysis using LightGBM and a comparative study of popular algorithms. 2021 4th International Conference on Computing and Communications Technologies (ICCCT), hal. 634–641. Tersedia di: https://ieeexplore.ieee.org/abstract/document/9711896/ [Diakses 22 Jul. 2024].

REDDY, E.M.K. dkk., 2022. Introduction to Naive Bayes and a review on its subtypes with applications. Bayesian reasoning and gaussian processes for machine learning applications, hal. 1–14.SABRY, F., 2023. K Nearest Neighbor Algorithm: Fundamentals and Applications. One Billion Knowledgeable.

VUJOVIĆ, Ž., 2021. Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 12(6), hal. 599–606.

WADE, C. dan GLYNN, K., 2020. Hands-On Gradient Boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python. Packt Publishing Ltd. Tersedia di: https://books.google.com/books?hl=id&lr=&id=2tcDEAAAQBAJ&oi=fnd&pg=PP1&dq=what+is+xgboost&ots=s5sLInmmmO&sig=UJAptEfWVJZ3QRGVkfXUq7xUov8 [Diakses 22 Jul. 2024].

WHIG, P., GUPTA, K. dan JIWANI, N., 2022. Real-Time Detection of Cardiac Arrest Using Deep Learning. AI-Enabled Multiple-Criteria Decision-Making Approaches for Healthcare Management. IGI Global, hal. 1–25. Tersedia di: https://www.igi-global.com/chapter/real-time-detection-of-cardiac-arrest-using-deep-learning/312326 [Diakses 22 Jul. 2024].

Analisis Perbandingan Model Machine Learning Tree-Based dan Non-Tree-Based untuk Tugas Klasifikasi

Penulis

DOI:

Kata Kunci:

Abstrak

Downloads

Referensi

Unduhan

Diterbitkan

Terbitan

Bagian

Lisensi

Cara Mengutip

Kirim Naskah

side menu

sertifikat akreditasi

Pengindeks Jurnal

Mendeley

Citations & Reference Manager

pengunjung

Keywords

Information

Supported by

Technical Support

Laboratorium

Direktori UB