Open Table Format

Table formats are a way to organize data files, they try to bring database-like features to data lake. Apache Hive is one of the earliest and most used table formats, but it was not written for object storage like AWS S3 or Alibaba OSS, rapid growing metadata tables slow down its performance. Newer systems like Apache Iceberg, Apache Hudi and Delta lake try to solve the problem as well as bringing the following features to the data lake:

Transactions (ACID)

Schema enforcement, evolution & versioning

Metadata scaling

Time Travel

Concurrent read & write

Independent consumption from storage

...

What's the difference of Iceberg/Hudi/Delta?

Apache Iceberg

Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Iceberg is designed to improve the known scalability limitations of Hive, which stores table metadata in a metastore that is backed by a relational database such as MySQL.

Apache Hudi

Hudi brings transactions, record-level updates/deletes and change streams to data lakes!

Delta Lake

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

Recent posts

../assets/images/featured/iceberg-unsplash.jpeg

A little bit of Apache Iceberg

Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for compute engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time.

Tags: iceberg

OpenTableFormat

2022-11-21 5 min read

../assets/images/featured/interior-wooden-frame-mockup-shelf-blue-wall-3d-rendering.jpeg

What is Open Table Format

OpenTableFormat is to object storage what constellation is to stars.

Tags: iceberg, hudi, deltalake

OpenTableFormat

2022-11-20 2 min read

../assets/images/featured/delta-vs-hudi-vs-iceberg.png

What's the difference of Iceberg/Hudi/Delta?

Tags: iceberg, hudi, deltalake

OpenTableFormat

2021-12-18 1 min read