Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing

Penulis

  • Irvi Oktanisa Mahasiswa Magister Fakultas Ilmu Komputer - Universitas Brawijaya
  • Ahmad Afif Supianto Fakultas Ilmu Komputer, Universitas Brawijaya

DOI:

https://doi.org/10.25126/jtiik.201855958

Kata Kunci:

Pebandingan, klasifikasi, data mining, decission tree, machine learning, bank direct marketing

Abstrak

Klasifikasi merupakan teknik dalam data mining untuk mengelompokkan data berdasarkan keterikatan data terhadap  data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada dataset Bank Direct Marketing. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada dataset Bank Direct Marketing. Teknik klasifikasi yang digunakan yaitu Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, dan CN2 Rule. Proses klasifikasi diawali dengan preprocessing data untuk melakukan penghilangan missing value dan pemilihan fitur pada dataset. Pada tahap evaluasi digunakan teknik 10 fold cross validation. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model Tree, Constant, Naive Bayes, dan Stochastic Gardient Descent. Kemudian diikuti oleh model Random Forest, K-Nearest Neighbor, CN-2 Rule, AdaBoost dan Support Vector Machine. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini Stochastic Gradient Descent terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada dataset Bank Direct Marketing.


Abstract

Classification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of  9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

Downloads

Download data is not yet available.

Referensi

AFANDIE, M. N., CHOLISSODIN, I., & SUPIANTO, A. A, 2014, Implementasi metode k-nearest neighbor untuk pendukung keputusan pemilihan menu makanan sehat. Repositori Jurnal Mahasiswa PTIIK UB, 3(1), 1.

ANGGODO, Y.P., CAHYANINGRUM, W., FAUZIYAH, A. N., KHOIRIYAH, I.L., KARTIKASARI, O., CHOLISSODIN, I., 2017, Hybrid K-Means dan Particle Swarm Optimization untuk clustering nasabah kredit, Jurnal Teknologi Informasi dan Ilmu Komputer, hlm. 104-110.

BARTIK, V, 2009, Assosiation based classification for relational data and its use in web mining, IEEE Symposium on Computatioanl Intelligence and Data mining, pp. 252-258.

BREIMAN, L., 1999, Random forests – random features, Technical Report 567, Statistics Departement, University of California, Berkeley.

CLARK, P., & BOSWELL, R., 1991, Rule induction with CN2: Some recent improvements, In: Kodratoff Y.(eds) Machine Learning – EWSL-91, EWSL 1991, Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), vol 482, Springer, Berlin, Heidelberg.

EKARISTIO, I., SOEBROTO, A. A., & SUPIANTO, A. A, 2015, Pengembangan sistem pendukung keputusan pemilihan bibit unggul sapi bali menggunakan metode k-nearest neighbor. Journal of Environmental Engineering and Sustainable Technology, 02(01), 49–57.

ELSALAMONY, H. A., & ELSAYAD, A. M., 2013, Bank direct marketing based on neural network, International Journal of Engineering and Advanced Technology, vol.2, pp. 392-400.

ELSALAMONY, H.A., 2014, Bank direct marketing analysis of data mining techniques, International Journal of Computer Applications, vol. 85, no.7.

GRZONKA, D., SUCHACKA, G., BOROWIK, B., 2016, Application of selected supervised classification methods to bank marketing campaign, Information Syatems in Management, vol.5 (1), pp. 36-48.

FIX, E., & HODGES, J. L., 1951, Discriminatory analysis, nonparametric discrimination: Consistency properties, Technical Report 4, USAF School of Aviation Medicine, Randolph Field, Texas.

FLACH, P. A., Classifier calibration, In: C. Sammut, G.I., Webb (eds), Encyclopedia of machine learning and data mining, Springer, Boston, MA.

GENKIN, A., LEWIS, D. D., & MADIGAN, D., 2007, Large-scale Bayesian logistic regression for text categorization, Technometrics, vol.49, pp. 291-304.

HAO, Z., WANG, Z., & ZHANG, Y., 2009, Improved classification based on predictive associative rules, IEEE International Conference on System, Man and Cybernatics, pp. 1165-1170.

HU, W., HU, W., & MAYBANK, S., 2008, AdaBoost-based algorithm for network instrusion detection, IEEE Transactions On Systems, Man, and Cybernetics - Part B: Cybernatics, vol.38, no.2.

KARIM, M., & RAHMAN, R. M., 2013, Decission tree and naïve bayes algorithm for classification and generation of actionable knowledge for direct marketing, Journal of Software Engineering and Application, vol.6,pp.196-206.

KLAS, W., & SCHRELF, M., 1995, Metaclasses and their application: Data model tailoring and database integration. Springer.

LEWIS, D. D., 1998, Naïve (Bayes) at forty: The independence assumption in information retrieval, In European Conference on Machine Learning, pp. 4-15.

MANDT, S., HOFFMAN, M. D., & BLEI, D. M., 2017, Stochastic gradient descent as approximate Bayesian inference, Journal of Machine Learning Research, 18, 1-35.

NIU, Q., XIA, X., & ZHANG, L., 2009, Assosiation classification based on compactness of rules, International Workshop On Knowledge Discovery And Data Mining, pp. 245-247.

PAL, M., 2005, Random forest classifier for remote sensing classification, Internatonal Journal of Remote Sensing, 26:1, 217-222.

QUINLAN, J.R., 1987, Simplifying decision trees, International Journal of Man-Machine Studies 27, pp: 221-234.

SHMILOVICI, A., 2009, Support vector machine, In:Maimon O., Rokach L. (eds) Data mining and knowledge discovery handbook, Springer, Boston, MA.

SUTHAHARAN, S., 2016, Support vector machine, In: Machine leaning models and algorithms for big data classification, Integrated Series In Information Systems, vol.36, Springer, Boston, MA.

VAIDEHI, R., 2016, Predictive modelling to improve successs raate of bank direct marketing campaign, International Journal of Management and Bussiness Study, vol 6, pp. 22-24.

VIJAYAKUMAR, V., & NEDUNCHEZHIAN, R., 2012, A study on video data mining, International Journal of Multimedia Information Retrieval, vol 1, issue 3, pp 153-172.

Diterbitkan

30-10-2018

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing. (2018). Jurnal Teknologi Informasi Dan Ilmu Komputer, 5(5), 567-576. https://doi.org/10.25126/jtiik.201855958