Peningkatan Performa Ensemble Learning pada Segmentasi Semantik Gambar dengan Teknik Oversampling untuk Class Imbalance
DOI:
https://doi.org/10.25126/jtiik.20241046831Abstrak
Perkembangan teknologi dan gaya hidup manusia yang semakin tinggi menghasilkan data-data yang berlimpah. Data-data tersebut dapat berbentuk data yang terstruktur dan tidak terstruktur. Data gambar termasuk dalam data yang tidak terstruktur. Aktifitas dan objek yang terekam dalam suatu gambar beraneka ragam. Secara normal, mata manusia dapat dengan mudah membedakan antara foreground dan background dari suatu gambar, tetapi komputer membutuhkan pembelajaran dalam membedakan keduanya. Segmentasi gambar adalah salah satu bidang dalam computer vision yang membahas bagaimana cara komputer mempelajari dan mengenali segmen dari suatu gambar sesuai label yang ditentukan. Dalam kenyataannya banyak data yang mempunyai class atau label yang tidak seimbang, tentunya akan mempengaruhi tingkat akurasi dari suatu prediksi. Dalam riset ini membahas bagaimana meningkatkan akurasi segmentasi semantik gambar pada metode ensemble learning untuk menangani masalah data yang tidak seimbang dalam segmentasi gambar. Teknik yang digunakan adalah sintetis oversampling sehingga menghasilkan data yang seimbang dan akurasi yang tinggi. Metode ensemble learning yang digunakan adalah Random Forest dan Light Gradien Boosting Machine (LGBM). Dengan menggunakan dataset Penn-Fudan Database for Pedestrian yang mengandung imbalanced class. Penggunaan teknik sintetis oversampling dapat memperbaikki tingkat akurasi pada class minoritas. Pada algoritma random forest mengalami peningkatan akurasi sebesar 37 % sedangkan pada algoritma LGBM meningkat sebesar 41 %.
Abstract
The development of technology and the increasingly high lifestyle of humans produce abundant data. These data can be in the form of structured and unstructured data. Image data is included in unstructured data. The activities and objects recorded in a picture are varied. Normally, the human eye can easily distinguish between the foreground and background of an image, but computers need learning to distinguish between the two. Image segmentation is one of the fields in computer vision that discusses how computers learn and recognize segments of an image according to specified labels. In reality, a lot of data has unbalanced classes or labels, of course, it will affect the accuracy of a prediction. This research discusses how to improve the accuracy of image semantic segmentation in the ensemble learning method to deal with the problem of unbalanced data in image segmentation. The technique used is synthetic oversampling so as to produce balanced data and high accuracy. The ensemble learning methods used are Random Forest and Light Gradient Boosting Machine (LGBM). By using the Penn-Fudan Database for Pedestrian dataset which contains a imbalanced class. The use of synthetic oversampling techniques can improve the level of accuracy in minority classes. The random forest algorithm experienced an increase in accuracy by 37% while the LGBM algorithm increased by 41%.
Downloads
Referensi
FERNÁNDEZ, A., GARCÍA, S., GALAR, M., PRATI, R. C., KRAWCZYK, B., & HERRERA, F., 2018. Learning from Imbalanced Data Sets. In Learning from Imbalanced Data Sets. https://doi.org/10.1007/978-3-319-98074-4_5
GUO, C., MA, Y., XU, Z., CAO, M., & YAO, Q., 2019. An Improved Oversampling Method for imbalanced Data-SMOTE Based on Canopy and K-means. Proceedings - 2019 Chinese Automation Congress, CAC 2019, 1467–1469. https://doi.org/10.1109/CAC48633.2019.8997367
HASHEMI, S. R., SALEHI, S. S. M., ERDOGMUS, D., PRABHU, S. P., WARFIELD, S. K., & GHOLIPOUR, A., 2019. Asymmetric Loss Functions and Deep Densely-Connected Networks for Highly-Imbalanced Medical Image Segmentation: Application to Multiple Sclerosis Lesion Detection. IEEE Access, 7, 1721–1735. https://doi.org/10.1109/ACCESS.2018.2886371
HE, H., & MA, Y., 2013. Imbalanced Learning - Foundations,Algorithms and Applications. 216.
LIANG, M., CHANG, Z., WAN, Z., GAN, Y., SCHLANGEN, E., & ŠAVIJA, B., 2022. Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete. Cement and Concrete Composites, 125(October 2021). https://doi.org/10.1016/j.cemconcomp.2021.104295
LITJENS, G., KOOI, T., BEJNORDI, B. E., ARINDRA, A., SETIO, A., CIOMPI, F., GHAFOORIAN, M., LAAK, J. A. W. M. VAN DER, GINNEKEN, B. VAN, & SÁNCHEZ, C. I., 2017. A survey on deep learning in medical image analysis. 42(December 2012), 60–88. https://doi.org/10.1016/j.media.2017.07.005
LIU, Z., & WU, D., 2019. Unsupervised Ensemble Learning for Class Imbalance Problems. Proceedings 2018 Chinese Automation Congress, CAC 2018, 3593–3600. https://doi.org/10.1109/CAC.2018.8623590
LU, M., & LI, F., 2020. Survey on lie group machine learning. Big Data Mining and Analytics, 3(4), 235–258. https://doi.org/10.26599/BDMA.2020.9020011
NUGROHO, A., FANANI, A. Z., & SHIDIK, G. F., 2021. Evaluation of Feature Selection Using Wrapper for Numeric Dataset with Random Forest Algorithm. Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, ISemantic 2021, 179–183. https://doi.org/10.1109/iSemantic52711.2021.9573249
RUANGTHONG, P., & JAIYEN, S., 2016. Hybrid ensembles of decision trees and Bayesian network for class imbalance problem. 2016 8th International Conference on Knowledge and Smart Technology, KST 2016, 39–42. https://doi.org/10.1109/KST.2016.7440523
SHUMALY, S., NEYSARYAN, P., & GUO, Y., 2020. Handling Class Imbalance in Customer Churn Prediction in Telecom Sector Using Sampling Techniques, Bagging and Boosting Trees. 2020 10h International Conference on Computer and Knowledge Engineering, ICCKE 2020, 82–87. https://doi.org/10.1109/ICCKE50421.2020.9303698
SMALL, H., & VENTURA, J., 2017. Handling Unbalanced Data in Deep Image Segmentation. https://svds.com/learning-imbalanced-classes/
SYAHRIR, M., KUNCORO, A., ARI KURNIAWAN, A., ARIEF SOELEMAN, M., & FAJAR SHIDIK, G., 2018. Image Enhancement Segmentation and Edge Detection in MRI for Mammogram Disease. Proceedings - 2018 International Seminar on Application for Technology of Information and Communication: Creative Technology for Human Life, ISemantic 2018, 515–521. https://doi.org/10.1109/ISEMANTIC.2018.8549801
TAUBERT, O., GOTZ, M., SCHUG, A., & STREIT, A., 2020. Loss Scheduling for Class-Imbalanced Image Segmentation Problems. Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020, 426–431. https://doi.org/10.1109/ICMLA51294.2020.00073
XIA, X., LU, Q., & GU, X., 2019. Exploring An Easy Way for Imbalanced Data Sets in Semantic Image Segmentation. Journal of Physics: Conference Series, 1213(2). https://doi.org/10.1088/1742-6596/1213/2/022003
YANG, F., MA, Z., & XIE, M., 2021. Image Classification with Superpixels and Feature Fusion Method. Journal of Electronic Science and Technology, 19(1), 70–78. https://doi.org/10.1016/j.jnlest.2021.100096
Unduhan
Diterbitkan
Terbitan
Bagian
Lisensi
Artikel ini berlisensi Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Penulis yang menerbitkan di jurnal ini menyetujui ketentuan berikut:
- Penulis menyimpan hak cipta dan memberikan jurnal hak penerbitan pertama naskah secara simultan dengan lisensi di bawah Creative Common Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) yang mengizinkan orang lain untuk berbagi pekerjaan dengan sebuah pernyataan kepenulisan pekerjaan dan penerbitan awal di jurnal ini.
- Penulis bisa memasukkan ke dalam penyusunan kontraktual tambahan terpisah untuk distribusi non ekslusif versi kaya terbitan jurnal (contoh: mempostingnya ke repositori institusional atau menerbitkannya dalam sebuah buku), dengan pengakuan penerbitan awalnya di jurnal ini.
- Penulis diizinkan dan didorong untuk mem-posting karya mereka online (contoh: di repositori institusional atau di website mereka) sebelum dan selama proses penyerahan, karena dapat mengarahkan ke pertukaran produktif, seperti halnya sitiran yang lebih awal dan lebih hebat dari karya yang diterbitkan. (Lihat Efek Akses Terbuka).