Komparasi Kinerja Algoritma C4.5, Gradient Boosting Trees, Random Forests, dan Deep Learning pada Kasus Educational Data Mining

Siti Mutrofin; M. Mughniy Machfud; Diema Hernyka Satyareni; Raden Venantius  Hari Ginardi; Chastine Fatichah

doi:10.25126/jtiik.2020742665

Penulis

Siti Mutrofin Universitas Pesantren Tinggi Darul Ulum, Jombang http://orcid.org/0000-0002-3418-6339
M. Mughniy Machfud Universitas Pesantren Tinggi Darul Ulum, Jombang
Diema Hernyka Satyareni Universitas Pesantren Tinggi Darul Ulum, Jombang
Raden Venantius Hari Ginardi Institut Teknologi Sepuluh Nopember, Surabaya
Chastine Fatichah Institut Teknologi Sepuluh Nopember, Surabaya

DOI:

https://doi.org/10.25126/jtiik.2020742665

Abstrak

Penentuan jurusan di SMA Negeri 1 Jogoroto, Jombang, Jawa Timur menggunakan kurikulum 2013, di mana penentuan jurusan siswa tidak hanya melibatkan keinginan siswa, tes peminatan yang dilakukan siswa di SMA pada minggu pertama, tetapi juga dilengkapi dengan nilai siswa semasa di SMP (nilai rapor siswa, nilai Ujian Nasional, serta rekomendasi guru Bimbingan Konseling), rekomendasi orang tua siswa. Selama ini, sekolah menggunakan proses konvensional dalam menentukan jurusan, yaitu menggunakan Microsoft Excel, yang cenderung lama serta rawan akan kekeliruan dalam melakukan penghitungan. Penentuan jurusan ini dilakukan setiap awal ajaran baru pada siswa baru kelas X. Rata-rata setiap tahun, sekolah mengelola siswa sejumlah 290 dengan waktu dan sumber daya manusia yang terbatas. Pada penelitian ini, penggunaan algoritma ID3 tidak cocok karena data bertipe numerik, sedangkan ID3 hanya mampu menggunakan data bertipe nomial maupun polinomial, sehingga diganti algoritma C4.5. Namun, beberapa penelitian mengatakan algoritma C4.5 memiliki kinerja kurang bagus dibandingkan algoritma Gradient Boosting Trees, Random Forests, dan Deep Learning. Untuk itu, dilakukan perbandingan antara keempat metode tersebut untuk melihat keefektifannya dalam menentukan jurusan di SMA. Data yang digunakan pada penelitian ini adalah data penerimaan siswa baru tahun ajaran 2018/2019. Hasil dari penelitian ini menunjukkan jika atribut yang digunakan bertipe polinomial dengan Deep Learning memiliki kinerja paling unggul untuk semua algoritma jika menggunakan fungsi activation ExpRectifier. Sedangkan jika atributnya bertipe numerik, Deep Learning memiliki kinerja paling unggul untuk semua algoritma jika menggunakan fungsi Tanh untuk semua random sampling. Namun, Deep Learning memiliki kinerja paling buruk untuk semua algoritma jika menggunakan loss Function berupa absolut.

Abstract

In SMAN 1 Jombang, East Java, the process of determining the students’ majors referred to the 2013 curriculum in which not only the students’ own choices and specialization tests conducted in their first week of SMA were considered but also the student’s SMP grades (a report card, UN scores, and counseling teacher’s recommendation) and parents' recommendation. So far, the school had used Microsoft Excel which required a long time to do and was prone to calculation errors in the process of determination. The process was carried out, with limited time and human resources, at the beginning of a new academic year for grade X students, consisting of 290 students on average. In this present research, the use of ID3 algorithm was not suitable because of its numeric data type instead of nominal or polynomial data. Thus, the C4.5 algorithm was applied, instead. However, the performance of C4.5 algorithm was proved lower than the algorithms of Gradient Boosting Trees, Random Forests, and Deep Learning. Hence, a comparison of performance between them was done to see their effectiveness in the process. The data was the list of new students of the academic year 2018/2019. The results showed that if the attributes are polynomial, the Deep Learning algorithm had the best performance when using the ExpRectifier activation function. When they were numeric, Deep Learning has the most superior performance when using the Tanh function. However, Deep Learning has the worst performance when using the loss function in the form of absolute.

Downloads

Download data is not yet available.

Biografi Penulis

Siti Mutrofin, Universitas Pesantren Tinggi Darul Ulum, Jombang

Sistem Informasi
M. Mughniy Machfud, Universitas Pesantren Tinggi Darul Ulum, Jombang

Sistem Informasi
Diema Hernyka Satyareni, Universitas Pesantren Tinggi Darul Ulum, Jombang

Sistem Informasi
Raden Venantius Hari Ginardi, Institut Teknologi Sepuluh Nopember, Surabaya

Teknik Informatika
Chastine Fatichah, Institut Teknologi Sepuluh Nopember, Surabaya

Teknik Informatika

Referensi

Bavan, L. et al., 2019. Adherence monitoring of rehabilitation exercise with inertial sensors: A clinical validation study. Gait & Posture, 70(May), pp. 211-217.

Brown, I. & Mues, C., 2012. An experimental comparison of classification algorithms for imbalanced credit. Expert Systems with Applications, 39(2012), p. Expert Systems with Applications.

Handhayani, T., Hendryli, J. & Hiryanto, L., 2018. Comparison of Shallow and Deep Learning Models for Classification of Lasem Batik Patterns. Semarang, IEEE.

Hussain, R. G. et al., 2019. A performance comparison of machine learning classification approaches for robust activity of daily living recognition. Artificial Intelligence Review, 52(1), p. 357–379.

Khan, A. & Ghosh, S. K., 2018. Data mining based analysis to explore the effect of teaching on student performance. Education and Information Technologies, 23(4), p. 1677–1697.

Kristanto, O., 2014. Penerapan algoritma klasifikasi data mining ID3 untuk menentukan penjurusan siswa SMAN 6 Semarang, Semarang: Universitas Dian Nuswantoro.

Kustiyahningsih, Y. & Syafa’ah, N., 2015. Sistem pendukung keputusan untuk menentukan jurusan pada siswa sma menggunakan metode kNN dan SMART. Jurnal Sistem Informasi Indonesia, 1(1), pp. 19-28.

Li, H. & Sun, J., 2012. Forecasting business failure: The use of nearest-neighbour support vectors and correcting imbalanced samples–Evidence from the Chinese hotel industry. Tourism Management, 33(3), pp. 622-634.

Márquez‐Vera, C. et al., 2016. Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1), pp. 107-124.

Márquez-Vera, C., Cano, A., Romero, C. & Ventura, S., 2013. Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 38(3), p. 315–330.

Mutrofin, S. et al., 2019. Detection of Potentially Students Drop Out of College in Case of Missing Value Using C4.5. Bandung, IEEE.

Natek, S. & Zwilling, M., 2014. Student data mining solution–knowledge management system related to higher education institutions. Expert Systems with Applications, 41(14), pp. 6400-6407.

Peña-Ayala, A., 2014. Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41(4), pp. 1432-1462.

Swastina, L., 2013. Penerapan Algoritma C4.5 Untuk Penentuan Jurusan Mahasiswa. Gema Aktualita, 2(1), pp. 93-98.

Tu, M. C., Shin, D. & Shin, D., 2009. Effective Diagnosis of Heart Disease through Bagging Approach. Tianjin, China, IEEE.

Komparasi Kinerja Algoritma C4.5, Gradient Boosting Trees, Random Forests, dan Deep Learning pada Kasus Educational Data Mining

Penulis

DOI:

Abstrak

Downloads

Biografi Penulis

Referensi

Unduhan

Diterbitkan

Terbitan

Bagian

Lisensi

Cara Mengutip

Kirim Naskah

side menu

sertifikat akreditasi

Pengindeks Jurnal

Mendeley

Citations & Reference Manager

pengunjung

Keywords

Information

Supported by

Technical Support

Laboratorium

Direktori UB