Peringkasan Dokumen Bahasa Indonesia Berbasis Non-Negative Matrix Factorization (NMF)

Penulis

Achmad Ridok

Abstrak

Abstrak

Peningkatan teknologi informasi telah memicu peningkatan dokumen teks digital secara massif termasuk dokumen berbahasa Indonesia. Penggalian informasi dari dokumen berupa ringkasan secara otomatis sangat dibutuhkan. Pada penelitian ini  peringkasan otomatis  menggunakan Nonnegatif Matrix Factorization (NMF) telah dikembangkan. Sistem dievaluasi dengan membandingkan  ringkasan sistem dengan  ringkasan dari  3 orang pakar   terhadap 100 dokumen bahasa Indonesia . Hasil evaluasi menunjukkan ringkasan  sistem  mempunyai rata-rata presisi dan recall   masing-masing 0.19724 dan 0.34085. Sedangkan  evaluasi ringkasan antar pakar  mempunyai rata-rata presisi dan recall masing-masing 0.68667 dan 0.70642..

 

Kata kunci: peringkasan dokumen, NMF

Abstract

Improvement of information technology has led to increased massively digital text documents, including documents of Indonesian language. Extracting information from documents such as automatic summary  is needed. In this study peringkasan automatically using non-negative Matrix Factorization (NMF) has been developed. The system was evaluated by comparing summary of system with summary of of three experts on 100 Indonesian documents. The evaluation shows summary of the system has an average precision and recall respectively 0.19724 and 0.34085. While the summary of an expert evaluation had an average precision and recall respectively 0.68667 and 0.70642.

Keywords: text summarization, NMF

Teks Lengkap:

PDF (English)

Referensi


ACHMAD RIDOK ,TRI CAHYO ROMADHONA, 2013, Peringkas Dokument Otomaris Menggunakan Metode Fuzzy Model Sistem Inferensi Mamdani, Dalam Proceedings Seminar Nasional Teknologi Informasi dan Multimedia . - Yogyakarta : STIMIK AMIKOM, Vols. 1 07-19.

AMINI M. R., & GALLINARI, P., 2002, The use of unlabeled data to improve supervised learning for text summarization, In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrival (SIGIR’02) . - Tampere, Finland. : [s.n.], Vols. (pp. 105–112). .

BARZILAY R. and ELHADAD, M, 1997, Using Lexical Chains for Text Summarization. In Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 10-17..

ERCAN G. and CICEKLI I, 2008, Lexical Cohesion based Topic Modeling for Summarization, InProceedings of 9th Int. Conf. Intelligent Text Processing and Computational Linguistics (CICLing-2008), pages 582-592.

ERKAN G. and D.R RADEV, 2004, Lexrank : Graph-based centrality as salience in text summarization. JAIR

FIRMIN T. and M.J. CHRZANOWSKI, 1999, An Evaluation of Automatic Text Summarization Systems, The MIT Press : Cambridge.

GONG Y., & LIU, X. 2001, Generic text summarization using relevance measure and latent semantic analysis, In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrival (SIGIR’01). - New Orleans, USA. Vols. (pp. 19–25).

HOVY E, 2003, Text Summarization, In Book The Oxford Handbook of Computational Linguistic, auth. Mitkov R.. Oxford: Oxford University Press.

HOVY E. and LIN, C-Y, 1999, Automated Text Summarization in SUMMARIST, In book Advances in Automatic Text Summarization. Maybury I. Mani and M.T. : The MIT Press, pages 81-94.

JU-HONG LEE SUN PARK, CHAN-MIN AHN , DAEHO KIM, 2009, Automatic generic document summarization based on non-negative, In Information Processing and Management 45. Elsevier Ltd, 20–34.

KAREL JEZEK and JOSEF STEINBERGER, 2008, Automatic Text Summarization (the state of the art 2007 and new challenges), Znalosti . - 2008, pp. 1-12.

LIN EDUARD HOVY and CHIN YEW, 1999, Automated text summarization in SUMMARIST, MIT Press, 1999, pages 81–94.

LUHN H.P, 1958, The Automatic Creation of Literature Abstracts, IBM Journal of Research Development.

MANI I. and M.T. MAYBURY, 1999, Advance in Automatic Text Summarization. Cambridge : The MIT, Press.

MIHALCEA R. and TARAU, P, 2004, Text-rank – bringing order into texts, In Proceeding of the Conference on Empirical Methods in Natural Language Processing.

QAZVINIAN V. and RADEV, D.R, 2008, Scientific paper summarization using citation summary networks.

TALA, FADILLAH Z. 2003. A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. Master of Logic Project. Institute for Logic, Language and Computation. Universiteit van Amsterdam. The Netherlands.

ZHA H, 2002, Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering, In Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrival (SIGIR’02), Tampere, Finland. : (pp. 113–120).




DOI: http://dx.doi.org/10.25126/jtiik.201411104