Sistem Propagasi Anotasi pada Metadata Lineage untuk Manajemen Data Warehouse

Penulis

  • Dion Ricky Saputra Universitas Brawijaya, Malang
  • Welly Purnomo Universitas Brawijaya, Malang
  • Nanang Yudi Setiawan Universitas Brawijaya, Malang

DOI:

https://doi.org/10.25126/jtiik.2022976833

Abstrak

ETL (extract, transform, dan load) merupakan proses yang dilibatkan dalam pembuatan dan manajemen data warehouse. Desain ETL dibuat menyesuaikan struktur sumber data dan data warehouse. Dengan adanya ketergantungan tersebut maka perubahan di sumber data bisa berdampak besar terhadap desain ETL. Ketika perubahan tersebut terjadi, pengelola ETL akan berkomunikasi dengan pemilik data untuk mengetahui rincian perubahan struktur data dalam rangka memperbaiki desain ETL. Aliran komunikasi ini akan semakin meningkat sejalan dengan jumlah sumber data yang digunakan. Semakin banyak sumber data yang diproses maka komunikasi tersebut berpotensi menjadi bottleneck. Informasi perubahan struktur data ini dapat dikomunikasikan melalui anotasi yang dilekatkan pada sumber data. Anotasi tersebut kemudian dipropagasikan sehingga dapat digunakan untuk memperbaiki rancangan ETL. Dengan menggunakan anotasi, harapannya aliran komunikasi antara pengelola ETL dengan pemilik data dapat berkurang. Permasalahan tersebut menunjukkan seberapa penting dikembangkannya sistem propagasi anotasi. Sistem propagasi anotasi tersusun atas tiga komponen yaitu ekstraksi metadata, propagasi anotasi, dan adapter. Pengujian sistem dilakukan menggunakan teknik blackbox dan user acceptance testing bersama pengguna akhir. Pengujian blackbox menghasilkan 30 kasus uji yang hasilnya valid. Hasil evaluasi user acceptance testing menunjukkan bahwa rata-rata pengguna menyatakan sangat setuju dengan sistem yang dikembangkan.

 

Abstract

ETL is a process of extracting, transforming, and loading data that is involved in creation and management of a data warehouse. Since ETL is deeply connected to the structure of the source data, if a small changes happens to that structure then the whole workflow might stop. Because one data source can be used by more than one ETL workflow, the impact of schema changes to the ETL design are enormous. When such incident happens, the ETL designer will ask the data owner for the details of the schema changes. The communication traffic between the ETL designer and the data owner will increase as the number of sources that are being used is increasing. Therefore potentially becoming a bottleneck. Information regarding schema changes of a data source can be attached as an annotation. This annotation will be then propagated so that the ETL manager can update the workflow according to the recorded changes.Using this technique, the communication traffic between the ETL designer and the data owner can be minimized. This problem highlights the need of an annotation propagation system. The system itself consists of three components: metadata extraction, adapter, and annotation propagation. To test the system, blackbox and user acceptance testing is used. The blackbox testing resulting with 30 test case which are all valid. The user acceptance testing is done with the end-user directly operating the system, and after analyzing the results it shows that on average the user is accepting the system.

 


Downloads

Download data is not yet available.

Referensi

BHAGWAT, D., CHITICARIU, L., TAN, W.C., & VIJAYVARGIYA, G., 2005. An annotation management system for relational databases. VLDB Journal, 14(4), pp.373-396.

BUNEMAN, P., KHANNA, S., & TAN, W.C., 2000. Data provenance: Some basic issues. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1974(December), pp.87-93.

CHITICARIU, L., TAN, W.C., & VIJAYVARGIYA, G., 2005. DBNotes: A post-it system for relational databases based on provenance. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.942-944.

GEERTS, F., KEMENTSIETSIDIS, A., & MILANO, D., 2006. MONDRIAN: Annotating and querying databases through colors and blocks. Proceedings - International Conference on Data Engineering, 2006, p.82.

HG INSIGHTS, 2022. Apache Airflow. HG Insights, [online] Tersedia di: <https://discovery.hgdata.com/product/apache-airflow> [Diakses 29 Juni 2022]

IKEDA, R., WIDOM, J., 2009. Data lineage: A survey. Technical Report. Stanford InfoLab.

LU, Y., LI, Y., & ELTABAKH, M.Y., 2016. Decorating the cloud: enabling annotation management in MapReduce. VLDB Journal, 25(3), pp.399-424.

SIMMHAN, Y., PLALE, B., & GANNON, D., 2005. A survey of data provenance techniques. Computer Science Department, Indiana University, 47405, p.69.

Diterbitkan

29-12-2022

Cara Mengutip

Sistem Propagasi Anotasi pada Metadata Lineage untuk Manajemen Data Warehouse. (2022). Jurnal Teknologi Informasi Dan Ilmu Komputer, 9(7), 1741-1746. https://doi.org/10.25126/jtiik.2022976833