Kombinasi Feature Selection Fisher Score dan Principal Component Analysis (PCA) untuk Klasifikasi Cervix Dysplasia

Penulis

  • Krisan Aprian Widagdo Program Studi Magister Sistem Informasi , Universitas Diponegoro
  • Kusworo Adi Departemen Fisika, Universitas Diponegoro
  • Rahmat Gernowo Departemen Fisika, Universitas Diponegoro

DOI:

https://doi.org/10.25126/jtiik.2020702987

Abstrak

Pengamatan citra Pap Smear merupakan langkah yang sangat penting dalam mendiagnosis awal terhadap gangguan servik. Pengamatan tersebut membutuhkan sumber daya yang besar. Dalam hal ini machine learning dapat mengatasi masalah tersebut. Akan tetapi, keakuratan machine learning bergantung pada fitur yang digunakan. Hanya fitur relevan dan diskriminatif yang mampu memberikan hasil klasifikasi akurat. Pada penelitian ini menggabungkan Fisher Score dan Principal Component Analysis (PCA). Pertama Fisher Score memilih fitur relevan berdasarkan perangkingan. Langkah selanjutnya PCA mentransformasikan kandidat fitur menjadi dataset baru yang tidak saling berkorelasi. Metode jaringan syaraf tiruan Backpropagation digunakan untuk mengevaluasi performa kombinasi Fisher Score dan PCA. Model dievaluasi dengan metode 5 fold cross validation. Selain itu kombinasi ini dibandingkan dengan model fitur asli dan model fitur hasil Fscore. Hasil percobaan menunjukkan kombinasi fisher score dan PCA menghasilkan performa terbaik (akurasi 0.964±0.006, Sensitivity 0.990±0.005 dan Specificity 0.889±0.009). Dari segi waktu komputasi, kombinasi Fisher Score dan PCA membutuhkan waktu relative cepat. Penelitian ini membuktikan bahwa penggunaan feature selection dan feature extraction mampu meningkatkan kinerja klasifikasi dengan waktu yang relative singkat.

 

Abstract

 

Examination Pap Smear images is an important step to early diagnose cervix dysplasia. It needs a lot of resources. In this case, Machine Learning can solve this problem. However, Machine learning depends on the features used. Only relevant and discriminant features can provide an accurate classification result. In this work, combining feature selection Fisher Score (FScore) and Principal Component Analysis (PCA) is applied. First, FScore selects relevant features based on rangking score. And then PCA transforms candidate features into a new uncorrelated dataset. Artificial Neural Network Backpropagation used to evaluate performance combination FScore PCA. The model evaluated with 5 fold cross validation. The other hand, this combination compared with original features model and FScore model. Experimental result shows the combination of Fscore PCA produced the best performance (Accuracy 0.964±0.006, Sensitivity 0.990±0.005 and Specificity 0.889±0.009). In term of computational time, this combination needed a reasonable time. In this work, it was proved that applying feature selection and feature extraction could improve performance classification with a promising time.

Downloads

Download data is not yet available.

Referensi

BASU, P., MITTAL, S., BHADRA VALE, D. dan CHAMI KHARAJI, Y., 2018. Secondary prevention of cervical cancer. Best Practice and Research: Clinical Obstetrics and Gynaecology, 47, pp.73–85.

BHARTI, K.K. dan SINGH, P.K., 2015. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), pp.3105–3114.

CAI, J., LUO, J., WANG, S. dan YANG, S., 2018. Feature selection in machine learning: A new perspective. Neurocomputing, 300, pp.70–79.

CHEN, H., YANG, L., LI, L., LI, M. dan CHEN, Z., 2019. An efficient cervical disease diagnosis approach using segmented images and cytology reporting. Cognitive Systems Research, 58, pp.265–277.

DONGMEI, H., SHIQING, H., XUHUI, H. dan XUE, Z., 2017. Prediction of wind loads on high-rise building using a BP neural network combined with POD. Journal of Wind Engineering and Industrial Aerodynamics, 170(January), pp.1–17.

DROTÁR, P., GAZDA, M. dan VOKOROKOS, L., 2019. Ensemble feature selection using election methods and ranker clustering. Information Sciences, 480, pp.365–380.

HSU, H.H., HSIEH, C.W. dan LU, M. DA, 2011. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 38(7), pp.8144–8150.

NASSER, I.M. dan ABU-NASER, S.S., 2019. Early lung cancer detection using artificial neural network. International Journal of Engineering and Information Systems, 45(1), pp.9–15.

NORUP, J., 2005. Classification of Pap-Smear Data by Transductive Neuro-Fuzzy Methods. Technical University of Denmark.

PANDEY, D., SHETTY, J., SAMBHAJI, C., SAXENA, P.U., MISHRA, D. dan CHAWLA, A., 2015. Cervical Cancer as a silent killer: A rare case report with review of literature. Journal of Cancer Research and Therapeutics, 11(3), p.653.

REMESEIRO, B. dan BOLON-CANEDO, V., 2019. A review of feature selection methods in medical applications. Computers in Biology and Medicine, 112(February), p.103375.

WANG, P., WANG, L., LI, Y., SONG, Q., LV, S. dan HU, X., 2019. Automatic cell nuclei segmentation and classification of cervical Pap smear images. Biomedical Signal Processing and Control, 48, pp.93–103.

WU, WEN. dan ZHOU, HAO., 2017. Data-Driven Diagnosis of Cervical Cancer With Support Vector Machine-Based Approaches. IEEE Access, 5, pp.25189–25195.

ZHENG, C., QING, S., WANG, J., LÜ, G., LI, H., LÜ, X., MA, C., TANG, J. dan YUE, X., 2019. Diagnosis of cervical squamous cell carcinoma and cervical adenocarcinoma based on Raman spectroscopy and support vector machine. Photodiagnosis and Photodynamic Therapy, 27(May), pp.156–161.

ZHU, C., IDEMUDIA, C.U. dan FENG, W., 2019. Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Informatics in Medicine Unlocked, 17(March), p.100179.

Diterbitkan

22-05-2020

Terbitan

Bagian

Ilmu Komputer

Cara Mengutip

Kombinasi Feature Selection Fisher Score dan Principal Component Analysis (PCA) untuk Klasifikasi Cervix Dysplasia. (2020). Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(3), 565-572. https://doi.org/10.25126/jtiik.2020702987