Sistem Rekognisi Citra Digital Bahasa Isyarat Menggunakan Convolutional Neural Network dan Spatial Transformer

Mohammad Alfiano Rizky Mahardika; Novanto Yudistira; Achmad Ridok

doi:10.25126/jtiik.2023118098

Penulis

Mohammad Alfiano Rizky Mahardika Universitas Brawijaya, Malang
Novanto Yudistira Universitas Brawijaya, Malang
Achmad Ridok Universitas Brawijaya, Malang

DOI:

https://doi.org/10.25126/jtiik.2023118098

Kata Kunci:

Convolutional Neural Network, spatial transformer, bahasa isyarat, klasifikasi real-time

Abstrak

Bahasa isyarat merupakan hal yang sangat penting bagi suatu kelompok masyarakat, yaitu masyarakat bisu atau tuli. Untuk dapat berkomunikasi dengan masyarakat bisu atau tuli, orang yang tidak bisu atau tuli memerlukan bahasa isyarat tersebut untuk dapat mengerti maksud atau pikiran mereka yang bisu atau tuli. Sebagian besar percakapan pada bahasa isyarat dilakukan dengan menggunakan tangan, dimana tangan beserta jari-jarinya digunakan untuk membentuk pose atau bentuk yang unik, sehingga dapat dikenali sebagai maksud tertentu. Penulis mengusulkan dikembangkan sistem rekognisi citra digital untuk dapat mengenali bahasa isyarat tersebut. Dengan menggunakan metode Convolutional Neural Network (CNN) yang merupakan bagian dari Deep Learning atau Machine Learning, sistem akan mengenali pose atau bentuk dari citra bahasa isyarat yang dimasukkan, dan memberikan luaran yang sesuai dengan maksud dari pose atau bentuk dari citra bahasa isyarat tersebut. Penelitian ini dimulai dengan pengumpulan data, baik data sekunder dari internet maupun data pribadi yang diambil secara manual. Data kemudian melalui pemrosesan awal dan diklasifikasikan dengan CNN, lalu didapatkan hasil untuk dianalisis. Apabila hasil memuaskan, model akan diekspor untuk dimasukkan ke dalam aplikasi berbasis web untuk digunakan secara real-time. Berdasarkan hasil pengujian, model yang terbaik untuk arsitektur adalah model EfficientNet B4 dengan menggunakan Hyperparameter optimizer Adam dan learning rate 0.001 beserta scheduler. Digunakan pretrained weights untuk meningkatkan akurasi tersebut, dan ditambahkan Spatial transformer untuk mencoba membuat model menjadi lebih kokoh. Ditambah dengan pretrained weights, model diekspor untuk digunakan secara real-time. Hasil pengujian real-time menunjukkan bahwa model mampu mendeteksi setidaknya 23 dari 26 alfabet pada latar belakang yang abstrak. Apabila diuji pada latar belakang polos seperti hitam atau putih, model mampu mendeteksi seluruh 26 alfabet dengan probabilitas yang hampir sempurna. Hal ini menunjukkan bahwa metode yang digunakan sudah mampu mengatasi masalah yang disampaikan.

Abstract

Sign language is very important for a group of people, namely the deaf or dumb. To be able to communicate with people who are mute or deaf, people who are not mute or deaf require sign language to be able to understand the intentions or thoughts of those who are mute or deaf. Most conversations in sign language are carried out using the hands, where the hands and their fingers are used to form unique poses or shapes, so that they can be recognized as having certain meanings. The author proposes to develop a digital image recognition system to be able to recognize sign language. By using the Convolutional Neural Network (CNN) method which is part of Deep Learning or Machine Learning, the system will recognize the pose or shape of the entered sign language image, and provide output that matches the meaning of the pose or shape of the sign language image. This research began with data collection, both secondary data from the internet and personal data taken manually. The data then goes through initial processing and is classified with CNN, then results are obtained for analysis. If the results are satisfactory, the model will be exported to be included in a web-based application for use in real-time. Based on the test results, the best model for the architecture is the EfficientNet B4 model with the Hyperparameter consisting of optimizer Adam and learning rate 0.001 along with the scheduler. Pretrained weights were used to improve accuracy, and Spatial transformers were added to try to make the model more robust. Coupled with pretrained weights, the model is exported for use in real-time. Real-time test results show that the model is able to detect at least 23 of the 26 alphabets on an abstract background. When tested on a plain background such as black or white, the model was able to detect all 26 alphabets with almost perfect probability. This shows that the method used is able to overcome the problem presented.

Downloads

Download data is not yet available.

Referensi

ARCOS-GARCÍA, Á., ALVAREZ-GARCIA, J.A. AND SORIA-MORILLO, L.M., 2018. Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods. Neural Networks, 99, pp.158-165.

BAGHEL, R., PAHADIYA, P. AND SINGH, U., 2022, June. Human Face Mask Identification using Deep Learning with OpenCV Techniques. In 2022 7th International Conference on Communication and Electronics Systems (ICCES) (pp. 1051-1057). IEEE.

CHIRODEA, M.C., NOVAC, O.C., NOVAC, C.M., BIZON, N., OPROESCU, M. AND GORDAN, C.E., 2021, July. Comparison of tensorflow and pytorch in convolutional neural network-based applications. In 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) (pp. 1-6). IEEE.

DAS, P., AHMED, T. AND ALI, M.F., 2020, June. Static hand gesture recognition for american sign language using deep convolutional neural network. In 2020 IEEE region 10 symposium (TENSYMP) (pp. 1762-1765). IEEE.

GALVEZ, R.L., BANDALA, A.A., DADIOS, E.P., VICERRA, R.R.P. AND MANINGO, J.M.Z., 2018, October. Object detection using convolutional neural networks. In TENCON 2018-2018 IEEE Region 10 Conference (pp. 2023-2027). IEEE.

HE, K., ZHANG, X., REN, S. AND SUN, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

INDU, M., SWETHA, N. AND SARITHA, C., 2023, March. Smart Chatbot for College Information Enquiry Using Deep Neural Network. In 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 991-994). IEEE.

JADERBERG, M., SIMONYAN, K. AND ZISSERMAN, A., 2015. Spatial transformer networks. Advances in neural information processing systems, 28.

JALAL, M.A., CHEN, R., MOORE, R.K. AND MIHAYLOVA, L., 2018, July. American sign language posture understanding with deep neural networks. In 2018 21st International Conference on Information Fusion (FUSION) (pp. 573-579). IEEE.

JU, Y., WANG, X. AND CHEN, X., 2019, April. Research on OMR recognition based on convolutional neural network tensorflow platform. In 2019 11th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) (pp. 688-691). IEEE.

KINGMA, D.P. AND BA, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

KRIZHEVSKY, A., SUTSKEVER, I. AND HINTON, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.

LANANG, A.A.M., 2021. Datasets SIBI Sign Language Alphabets. Kaggle. Tersedia pada: < https://www.kaggle.com/datasets/mlanangafkaar/datasets-lemlitbang-sibi-alphabets> [Diakses 10 Juli 2023].

MAYBERRY, R.I. AND SQUIRES, B., 2006. Sign language acquisition. Encyclopedia of language and linguistics, 11, pp.739-43.

RASCHKA, S. AND MIRJALILI, V., 2019. Python machine learning: Machine learning and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing Ltd.

SARKAR, D., BALI, R. AND GHOSH, T., 2018. Hands-On Transfer Learning with Python: Implement advanced deep learning and neural network models using TensorFlow and Keras. Packt Publishing Ltd.

SOKOLOVA, M. AND LAPALME, G., 2009. A systematic analysis of performance measures for classification tasks. Information processing & management, 45(4), pp.427-437.

SOMESHWAR, D., BHANUSHALI, D., CHAUDHARI, V. AND NADKARNI, S., 2020, July. Implementation of Virtual Assistant with Sign Language using Deep Learning and TensorFlow. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 595-600). IEEE.

SRIVASTAVA, N., HINTON, G., KRIZHEVSKY, A., SUTSKEVER, I. AND SALAKHUTDINOV, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), pp.1929-1958.

TAN, M. AND LE, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.

TAQI, A.M., AWAD, A., AL-AZZO, F. AND MILANOVA, M., 2018, April. The impact of multi-optimizers and data augmentation on TensorFlow convolutional neural network performance. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 140-145). IEEE.

TASKIRAN, M., KILLIOGLU, M. AND KAHRAMAN, N., 2018, July. A real-time system for recognition of American sign language by using deep learning. In 2018 41st international conference on telecommunications and signal processing (TSP) (pp. 1-5). IEEE.

THAKUR, A., 2019. American Sign Language Dataset. Kaggle. Tersedia pada: < https://www.kaggle.com/datasets/ayuraj/asl-dataset> [Diakses 10 Juli 2023].

YUAN, L., QU, Z., ZHAO, Y., ZHANG, H. AND NIAN, Q., 2017, March. A convolutional neural network based on TensorFlow for face recognition. In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 525-529). IEEE.

Sistem Rekognisi Citra Digital Bahasa Isyarat Menggunakan Convolutional Neural Network Dan Spatial Transformer

Penulis

DOI:

Kata Kunci:

Abstrak

Downloads

Referensi

Unduhan

Diterbitkan

Terbitan

Bagian

Lisensi

Cara Mengutip

Kirim Naskah

side menu

sertifikat akreditasi

Pengindeks Jurnal

Mendeley

Citations & Reference Manager

pengunjung

Keywords

Information

Supported by

Technical Support

Laboratorium

Direktori UB