Peningkatan Performa Ensemble Learning pada Segmentasi Semantik Gambar dengan Teknik Oversampling untuk Class Imbalance

Arie Nugroho; M. Arief Soeleman; Ricardus Anggi Pramunendar; Affandy Affandy; Aris Nurhindarto

doi:10.25126/jtiik.2024106831

Penulis

Arie Nugroho Universitas Dian Nuswantoro, Semarang
M. Arief Soeleman Universitas Dian Nuswantoro, Semarang
Ricardus Anggi Pramunendar Universitas Dian Nuswantoro, Semarang
Affandy Affandy Universitas Dian Nuswantoro, Semarang
Aris Nurhindarto Universitas Dian Nuswantoro, Semarang

DOI:

https://doi.org/10.25126/jtiik.2024106831

Abstrak

Perkembangan teknologi dan gaya hidup manusia yang semakin tinggi menghasilkan data-data yang berlimpah. Data-data tersebut dapat berbentuk data yang terstruktur dan tidak terstruktur. Data gambar termasuk dalam data yang tidak terstruktur. Aktifitas dan objek yang terekam dalam suatu gambar beraneka ragam. Secara normal, mata manusia dapat dengan mudah membedakan antara foreground dan background dari suatu gambar, tetapi komputer membutuhkan pembelajaran dalam membedakan keduanya. Segmentasi gambar adalah salah satu bidang dalam computer vision yang membahas bagaimana cara komputer mempelajari dan mengenali segmen dari suatu gambar sesuai label yang ditentukan. Dalam kenyataannya banyak data yang mempunyai class atau label yang tidak seimbang, tentunya akan mempengaruhi tingkat akurasi dari suatu prediksi. Dalam riset ini membahas bagaimana meningkatkan akurasi segmentasi semantik gambar pada metode ensemble learning untuk menangani masalah data yang tidak seimbang dalam segmentasi gambar. Teknik yang digunakan adalah sintetis oversampling sehingga menghasilkan data yang seimbang dan akurasi yang tinggi. Metode ensemble learning yang digunakan adalah Random Forest dan Light Gradien Boosting Machine (LGBM). Dengan menggunakan dataset Penn-Fudan Database for Pedestrian yang mengandung imbalanced class. Penggunaan teknik sintetis oversampling dapat memperbaikki tingkat akurasi pada class minoritas. Pada algoritma random forest mengalami peningkatan akurasi sebesar 37 % sedangkan pada algoritma LGBM meningkat sebesar 41 %.

Abstract

The development of technology and the increasingly high lifestyle of humans produce abundant data. These data can be in the form of structured and unstructured data. Image data is included in unstructured data. The activities and objects recorded in a picture are varied. Normally, the human eye can easily distinguish between the foreground and background of an image, but computers need learning to distinguish between the two. Image segmentation is one of the fields in computer vision that discusses how computers learn and recognize segments of an image according to specified labels. In reality, a lot of data has unbalanced classes or labels, of course, it will affect the accuracy of a prediction. This research discusses how to improve the accuracy of image semantic segmentation in the ensemble learning method to deal with the problem of unbalanced data in image segmentation. The technique used is synthetic oversampling so as to produce balanced data and high accuracy. The ensemble learning methods used are Random Forest and Light Gradient Boosting Machine (LGBM). By using the Penn-Fudan Database for Pedestrian dataset which contains a imbalanced class. The use of synthetic oversampling techniques can improve the level of accuracy in minority classes. The random forest algorithm experienced an increase in accuracy by 37% while the LGBM algorithm increased by 41%.

Downloads

Download data is not yet available.

Referensi

FERNÁNDEZ, A., GARCÍA, S., GALAR, M., PRATI, R. C., KRAWCZYK, B., & HERRERA, F., 2018. Learning from Imbalanced Data Sets. In Learning from Imbalanced Data Sets. https://doi.org/10.1007/978-3-319-98074-4_5

GUO, C., MA, Y., XU, Z., CAO, M., & YAO, Q., 2019. An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means. Proceedings - 2019 Chinese Automation Congress, CAC 2019, 1467–1469. https://doi.org/10.1109/CAC48633.2019.8997367

HASHEMI, S. R., SALEHI, S. S. M., ERDOGMUS, D., PRABHU, S. P., WARFIELD, S. K., & GHOLIPOUR, A., 2019. Asymmetric Loss Functions and Deep Densely-Connected Networks for Highly-Imbalanced Medical Image Segmentation: Application to Multiple Sclerosis Lesion Detection. IEEE Access, 7, 1721–1735. https://doi.org/10.1109/ACCESS.2018.2886371

HE, H., & MA, Y., 2013. Imbalanced Learning - Foundations,Algorithms and Applications. 216.

LIANG, M., CHANG, Z., WAN, Z., GAN, Y., SCHLANGEN, E., & ŠAVIJA, B., 2022. Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete. Cement and Concrete Composites, 125(October 2021). https://doi.org/10.1016/j.cemconcomp.2021.104295

LITJENS, G., KOOI, T., BEJNORDI, B. E., ARINDRA, A., SETIO, A., CIOMPI, F., GHAFOORIAN, M., LAAK, J. A. W. M. VAN DER, GINNEKEN, B. VAN, & SÁNCHEZ, C. I., 2017. A survey on deep learning in medical image analysis. 42(December 2012), 60–88. https://doi.org/10.1016/j.media.2017.07.005

LIU, Z., & WU, D., 2019. Unsupervised Ensemble Learning for Class Imbalance Problems. Proceedings 2018 Chinese Automation Congress, CAC 2018, 3593–3600. https://doi.org/10.1109/CAC.2018.8623590

LU, M., & LI, F., 2020. Survey on lie group machine learning. Big Data Mining and Analytics, 3(4), 235–258. https://doi.org/10.26599/BDMA.2020.9020011

NUGROHO, A., FANANI, A. Z., & SHIDIK, G. F., 2021. Evaluation of Feature Selection Using Wrapper for Numeric Dataset with Random Forest Algorithm. Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, ISemantic 2021, 179–183. https://doi.org/10.1109/iSemantic52711.2021.9573249

RUANGTHONG, P., & JAIYEN, S., 2016. Hybrid ensembles of decision trees and Bayesian network for class imbalance problem. 2016 8th International Conference on Knowledge and Smart Technology, KST 2016, 39–42. https://doi.org/10.1109/KST.2016.7440523

SHUMALY, S., NEYSARYAN, P., & GUO, Y., 2020. Handling Class Imbalance in Customer Churn Prediction in Telecom Sector Using Sampling Techniques, Bagging and Boosting Trees. 2020 10h International Conference on Computer and Knowledge Engineering, ICCKE 2020, 82–87. https://doi.org/10.1109/ICCKE50421.2020.9303698

SMALL, H., & VENTURA, J., 2017. Handling Unbalanced Data in Deep Image Segmentation. https://svds.com/learning-imbalanced-classes/

SYAHRIR, M., KUNCORO, A., ARI KURNIAWAN, A., ARIEF SOELEMAN, M., & FAJAR SHIDIK, G., 2018. Image Enhancement Segmentation and Edge Detection in MRI for Mammogram Disease. Proceedings - 2018 International Seminar on Application for Technology of Information and Communication: Creative Technology for Human Life, ISemantic 2018, 515–521. https://doi.org/10.1109/ISEMANTIC.2018.8549801

TAUBERT, O., GOTZ, M., SCHUG, A., & STREIT, A., 2020. Loss Scheduling for Class-Imbalanced Image Segmentation Problems. Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020, 426–431. https://doi.org/10.1109/ICMLA51294.2020.00073

XIA, X., LU, Q., & GU, X., 2019. Exploring An Easy Way for Imbalanced Data Sets in Semantic Image Segmentation. Journal of Physics: Conference Series, 1213(2). https://doi.org/10.1088/1742-6596/1213/2/022003

YANG, F., MA, Z., & XIE, M., 2021. Image Classification with Superpixels and Feature Fusion Method. Journal of Electronic Science and Technology, 19(1), 70–78. https://doi.org/10.1016/j.jnlest.2021.100096

Peningkatan Performa Ensemble Learning pada Segmentasi Semantik Gambar dengan Teknik Oversampling untuk Class Imbalance

Penulis

DOI:

Abstrak

Downloads

Referensi

Unduhan

Diterbitkan

Terbitan

Bagian

Lisensi

Cara Mengutip

Kirim Naskah

side menu

sertifikat akreditasi

Pengindeks Jurnal

Mendeley

Citations & Reference Manager

pengunjung

Keywords

Information

Supported by

Technical Support

Laboratorium

Direktori UB