Optimasi Bobot K-Means Clustering untuk Mengatasi Missing Value dengan Menggunakan Algoritma Genetica

Penulis

Bain Khusnul Khotimah, Muhammad Syarief, Miswanto Miswanto, Herry Suprajitno

Abstrak

Nilai yang hilang membutuhkan preprosesing dengan teknik imputasi untuk menghasilkan data yang lengkap. Proses imputasi membutuhkan initial bobot yang sesuai, karena data yang dihasilkan adalah data pengganti. Pemilihan nilai bobot yang optimal dan kesesuaian nilai K pada metode K-Means Imputation (KMI) merupakan masalah besar, sehingga menimbulkan error semakin meningkat. Model gabungan algoritma genetika (GA) dan KMI atau yang dikenal GAKMI digunakan untuk menentukan bobot optimal pada setiap cluster data yang mengandung nilai yang hilang. Algoritma genetika digunakan untuk memilih bobot dengan menggunakan pengkodean bilangan riel pada kromosom. Model hybrid GA dan KMI dengan pengelompokan menggunakan jumlah jarak Euclidian setiap titik data dari pusat clusternya. Pengukuran kinerja algoritma menggunakan fungsi kebugaran optimal dengan nilai MSE terkecil. Hasil percobaan data hepatitis menunjukkan bahwa GA efisien dalam menemukan nilai bobot awal optimal dari ruang pencarian yang besar. Hasil perhitungan menggunakan nilai MSE =0.044 pada K=3 dan replika ke-5 menunjukkan kinerja GAKMI menghasilkan tingkat kesalahan yang rendah untuk data hepatitis dengan atribut campuran. Hasil penelitian dengan menggunakan pengujian tingkat imputasi menunjukkan algoritma GAKMI menghasilkan nilai r = 0.526 lebih tinggi dibandingkan dengan metode lainnya. Penelitian ini menunjukkan GAKMI menghasilkan nilai r yang lebih tinggi dibandingkan metode imputasi lainnya sehingga dianggap paling baik dibandingkan teknik imputasi secara umum. 

 

Abstract

Missing values require preprocessing techniques as imputation to produce complete data. Complete data imputation results require the appropriate initial weights, because the resulting data is replacement data. The choice of the optimal weighting value and the suitability of the network nodes in the K-Means Imputation (KMI) method are big problems, causing increasing errors. The combined model of Genetic Algorithm (GA) and KMI is used to determine the optimal weights for each data cluster containing missing values. Genetic algorithm is used to select weights by using real number coding on chromosomes. GA is applied to the KMI using clustering calculated using the sum of the Euclidean distances of each data point from the center of the cluster. Performance measurement algorithms using the fitness function optimally with the smallest MSE value. The results of the hepatitis data experiment show that GA is efficient in finding the optimal initial weight value from a large search space. The results of calculations using the MSE value = 0.04 for K = 3 and the 5th replication. So, GAKMI resulted in a low error rate for mixed data. The results of research using imputation level testing performed GAKMI  produced r = 0.526 higher than the other methods. Thus, the higher the r value, the best for the imputation technique.


Teks Lengkap:

PDF

Referensi


AL KINDHI, B., SARDJONO, T. A., PURNOMO, M. H., VERKERKE, G. J., 2019. Hybrid K-means, Fuzzy C-Means, And Hierarchical Clustering for DNA Hepatitis C Virus Trend Mutation Analysis. Expert Systems with Applications, 1 May 2019, pp. 373-381.

A ACUNA, E, AND RODRIGUES C., 2004. The Treatment of Missing Values and its Effect is the Classifier Accuracy. Proceedings of the Meeting of the International Federation of Classification Sociaties (IFCS), 15 Juli 2004.

AL MALKI, A., MOHAMED M. RIZK1, M. M., EL-SHORBAGY, M. A., MOUSA, A. A., 2016. Hybrid Genetic Algorithm with K-Means for Clustering Problems, Open Journal of Optimization, issue 5, pp. 71-83.

ANWAR, T., SISWANTINING, T., SARWINDA, D., SOEMARTOJO, S. M., BUSTAMAM, A., 2019. A study on missing values imputation using K-Harmonic means algorithm: Mixed datasets. AIP Conference Proceedings, issue 2202, pp.1-10.

BINU, D. 2015. Cluster Analysis Using Optimization Algorithms with Newly Designed Objective Functions. Expert Systems with Applications, 42(14), pp.5848- 5859.

CHEHOURI, A., YOUNES, R., KHODER, J., PERRON, J., ILINCA, A., 2017. A Selection Process for Genetic Algorithm Using Clustering Analysis. Algorithms, 10(123), pp.1-15.

DAVEY, A, And SAVLA J., 2010. Statistical Power Analysis with Missing Data. New York: Taylor and Francis Group.

ENDERS, C. K., 2014. Applied Missing Data Analysis [monograph online]. New York: The Guilford Press.

EL-SAWY, A. A., HUSSEIN, M. A., ZAKI, E.M. AND MOUSA, A. A., 2014. An Introduction to Genetic Algorithms: A Survey A Practical Issues. International Journal of Scientific & Engineering Research, 5(1), pp.252.

FARAG, M.A., EL-SHORBAGY, M.A., EL-DESOKY, I.M., EL-SAWY, A.A. MOUSA, A.A., 2015a. Genetic Algorithm Based on K-Means-Clustering Technique for Multi-Objective Resource Allocation Problems. British Journal of Applied Science & Technology, 8(1), pp. 80-96.

http://dx.doi.org/10.9734/BJAST/2015/16570

FARAG, M. A., EL-SHORBAGY, M.A., EL-DESOKY, I. M., EL-SAWY, A.A., MOUSA, A. A., 2015b. Binary-Real Coded Genetic Algorithm Based K-Means Clustering for Unit Commitment Problem. Applied Mathematics, 6(11), pp.1873-1890.

http://dx.doi.org/10.4236/am.2015.611165

IZZAH, A., AND HAYATIN, N., 2013. Imputasi Missing Data Menggunakan Algoritma Pengelompokan Data K-Harmonic Means. Seminar Nasional Matematika dan Aplikasinya (SNMA), 21 September 2013.

ISLAM, M. T. P., BASAK, K., BHOWMIK, P., KHAN, M., 2020. Data Clustering Using Hybrid Genetic Algorithm with k-Means and k-Medoids Algorithms. 2019 23rd International Computer Science and Engineering Conference (ICSEC) IEEE Xplore. Phuket, Thailand, 30 January 2020.

KACZMAROWSKI, A., YANG, S., SZLUFARSKA, I. AND MORGAN, D. 2015. Genetic Algorithm Optimization of Defect Clusters in Crystalline Materials. Computational Materials Science, 98(1), pp. 234-244.

KHOTIMAH, B. K., MISWANTO, SUPRAJITNO, H., 2020. Optimization of Feature Selection Using Genetic Algorithm in Naïve Bayes Classification for Incomplete Data, International. Journal of Intelligent Engineering and Systems (IJIES),13(1), pp.334-343.

KHOTIMAH, B. K., IRHAMNI, F., SUNDARWATI, T., 2016. A Genetic Algorithm for Optimized Initial Centers K-Means Clustering in SMEs. Journal of Theoretical and Applied Information Technology (JATIT), 15 August 2016, 90(1), pp. 23-30.

LI, D., DEOGUN, J., SPAULDING, W., SHUART B. 2004. Toward Missing Data Imputation: Study of Fuzzy K-Means Clustering Method. Proceedings 4 th International Conference. 2004 1 Jun 2004.

MAULIK, U. AND BANDYOPADHYAY, S. 2000. Genetic Algorithm Based Clustering Technique. Pattern Recognition, 33 (9), pp 1455 -1465.

MARGHNY, M. H., EL-AZIZ, R. M. A., TALOBA, R. M. A., 2011. An Effective Evolutionary Clustering Algorithm: Hepatitis C Case Study. International Journal of Computer Applications, 34(6), pp.1-6.

RAHMAN, M.A. AND M.Z. ISLAM, 2014. A Hybrid Clustering Technique Combining a Novel Genetic Algorithm with K-Means. Knowledge-Based Systems, 71, pp.345-365.

MAHBOOB, T., IJAZ, A., SHAHZAD, A., AND KALSOOM, M., 2018. Handling Missing Values in Chronic Kidney Disease Datasets Using KNN, K-Means and K-Medoids Algorithms. International Conference on Open Source Systems and Technologies.

VILORIAA A., LEZAMA, O. B. P., 2019. Improvements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs. Procedia Computer Science, Issue 151, pp.1201–1206.

ZEEBAREE, D. Q., HARON, H., ABDULAZEEZ, A. M., AND SUBHI R. M., 2017. Combination of K-Means Clustering with Genetic Algorithm: A Review. International Journal of Applied Engineering Research, 12(24), pp.14238- 14245.




DOI: http://dx.doi.org/10.25126/jtiik.2021844912