Apache Iceberg
Ryan Blue
2019 Big Data Orchestration Summit
Netflix’s Data Warehouse
● Smarter processing engines
○ CBO, better join implementations
○ Result set caching, materialized views
● Reduce manual data maintenance
○ Data librarian services
○ Declarative instead of imperative
5-year Challenges
● Unsafe operations are everywhere
○ Writing to multiple partitions
○ Renaming a column
● Interaction with object stores causes major headaches
○ Eventual consistency to performance problems
○ Output committers can’t fix it
● Endless scale challenges
Problem Whack-a-mole
What is Iceberg?
Iceberg is a scalable format for
tables with a lot of best
practices built in.
A format?
We already have Parquet, Avro
and ORC . . .
A table format.
● File formats help you modify or skip data in a single file
● Table formats do the same thing for a collection of files
● To demonstrate this, consider Hive tables . . .
A table format
● Key idea: organize data in a directory tree
date=20180513/
|- hour=18/
| |- ...
|- hour=19/
| |- part-000.parquet
| |- ...
| |- part-031.parquet
|- hour=20/
| |- ...
Hive Tables
● Filter: WHERE date = '20180513' AND hour = 19
date=20180513/
|- hour=18/
| |- ...
|- hour=19/
| |- part-000.parquet
| |- ...
| |- part-031.parquet
|- hour=20/
| |- ...
Hive Tables
● Problem: too much directory listing for large tables
● Solution: use HMS to track partitions
date=20180513/hour=19 -> hdfs:/.../date=20180513/hour=19
date=20180513/hour=20 -> hdfs:/.../date=20180513/hour=20
● The file system still tracks the files in each partition . . .
Hive Metastore
● State is kept in both the metastore and in a file system
● Changes are not atomic without locking
● Requires directory listing
○ O(n) listing calls, n = # matching partitions
○ Eventual consistency breaks correctness
Hive Tables: Problems
● Everything supports Hive tables*
○ Engines: Hive, Spark, Presto, Flink, Pig
○ Tools: Hudi, NiFi, Flume, Sqoop
● Simplicity and ubiquity have made Hive tables indispensable
● The whole ecosystem uses the same at-rest data!
Hive Tables: Benefits
Iceberg
● An open spec and community for at-rest data interchange
○ Maintain a clear spec for the format
○ Design for multiple implementations across languages
○ Support needs across projects to avoid fragmentation
Iceberg’s Goals
● Improve scale and reliability
○ Work on a single node, scale to a cluster
○ All changes are atomic, with serializable isolation
○ Native support for cloud object stores
○ Support many concurrent writers
Iceberg’s Goals
● Fix persistent usability problems
○ In-place evolution for schema and layout (no side-effects)
○ Hide partitioning: insulate queries from physical layout
○ Support time-travel, rollback, and metadata inspection
○ Configure tables, not jobs
● Tables should have no unpleasant surprises
Iceberg’s Goals
● Key idea: track all files in a table over time
○ A snapshot is a complete list of files in a table
○ Each write produces and commits a new snapshot
Iceberg’s Design
S1 S2 S3 ...
S1 S2 S3 ...
R W
● Readers use the current snapshot
● Writers optimistically create new snapshots, then commit
Iceberg’s Design
In reality, it’s a bit more
complicated.
● All changes are atomic
● No expensive (or inconsistent) file system operations
● Snapshots are indexed for scan planning on a single node
● CBO metrics are reliable
● Versions for incremental updates and materialized views
Iceberg Design Benefits
Iceberg at Netflix
● Production tables: tens of petabytes, millions of partitions
○ Scan planning fits on a single node
○ Advanced filtering enables more use cases
○ Overall performance is better
● Low latency queries are faster for large tables
Scale
● Production Flink pipeline writing in 3 AWS regions
● Lift service moving data into a single region
● Merge service compacting small files
Concurrency
● Rollback is popular
● Metadata tables
○ Track down the version a job read
○ Find the process that wrote a bad version
Usability
● Spark vectorization for faster bulk reads
○ Presto vectorization already done
● Row-level delete encodings
○ MERGE INTO
○ ID equality predicates
Future Work
Thank you!
Questions?
Ryan Blue
rblue@netflix.com

More Related Content

PDF
Apache Iceberg: An Architectural Look Under the Covers
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PDF
Iceberg: a fast table format for S3
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
PDF
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Deep dive into Coroutines on JVM @ KotlinConf 2017
PDF
Introduction SQL Analytics on Lakehouse Architecture
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Iceberg: a fast table format for S3
Iceberg: A modern table format for big data (Strata NY 2018)
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Deep dive into Coroutines on JVM @ KotlinConf 2017
Introduction SQL Analytics on Lakehouse Architecture

What's hot (20)

PDF
Building an open data platform with apache iceberg
PDF
Apache Hudi: The Path Forward
PPTX
Free Training: How to Build a Lakehouse
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
PDF
Parquet performance tuning: the missing guide
PPTX
iceberg introduction.pptx
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PDF
The Parquet Format and Performance Optimization Opportunities
PDF
From Data Warehouse to Lakehouse
PDF
Achieving Lakehouse Models with Spark 3.0
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PDF
Building large scale transactional data lake using apache hudi
PDF
Let’s get to know Snowflake
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PPTX
Data Lakehouse Symposium | Day 4
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PPTX
Introduction to Apache Kafka
Building an open data platform with apache iceberg
Apache Hudi: The Path Forward
Free Training: How to Build a Lakehouse
Making Data Timelier and More Reliable with Lakehouse Technology
Parquet performance tuning: the missing guide
iceberg introduction.pptx
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
The Parquet Format and Performance Optimization Opportunities
From Data Warehouse to Lakehouse
Achieving Lakehouse Models with Spark 3.0
Real-time Analytics with Trino and Apache Pinot
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Building large scale transactional data lake using apache hudi
Let’s get to know Snowflake
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Data Lakehouse Symposium | Day 4
HBase and HDFS: Understanding FileSystem Usage in HBase
Introduction to Apache Kafka
Ad

Similar to Apache Iceberg - A Table Format for Hige Analytic Datasets (20)

PDF
Presto Summit 2018 - 09 - Netflix Iceberg
PDF
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
PDF
Batch Processing at Scale with Flink & Iceberg
PDF
week1slides1704202828322.pdf
PPTX
Change data capture
PDF
Hadoop 3 @ Hadoop Summit San Jose 2017
PDF
Apache Hadoop 3.0 Community Update
PDF
Fluent Bit: Log Forwarding at Scale
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PDF
Spark Meetup at Uber
PDF
Data Science in the Cloud @StitchFix
PDF
Netflix Open Source Meetup Season 4 Episode 2
PDF
Understanding Hadoop
PDF
How to Develop and Operate Cloud First Data Platforms
PPTX
ApacheCon 2022_ Large scale unification of file format.pptx
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
PDF
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
PDF
Netflix running Presto in the AWS Cloud
PDF
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
PPTX
Presto Summit 2018 - 09 - Netflix Iceberg
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Batch Processing at Scale with Flink & Iceberg
week1slides1704202828322.pdf
Change data capture
Hadoop 3 @ Hadoop Summit San Jose 2017
Apache Hadoop 3.0 Community Update
Fluent Bit: Log Forwarding at Scale
A Day in the Life of a Druid Implementor and Druid's Roadmap
Spark Meetup at Uber
Data Science in the Cloud @StitchFix
Netflix Open Source Meetup Season 4 Episode 2
Understanding Hadoop
How to Develop and Operate Cloud First Data Platforms
ApacheCon 2022_ Large scale unification of file format.pptx
AWS Big Data Demystified #1: Big data architecture lessons learned
Type safe, versioned, and rewindable stream processing with Apache {Avro, K...
Netflix running Presto in the AWS Cloud
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
Ad

More from Alluxio, Inc. (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101

Recently uploaded (20)

PPTX
Folder Lock 10.1.9 Crack With Serial Key
PPTX
A Spider Diagram, also known as a Radial Diagram or Mind Map.
PPT
3.Software Design for software engineering
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PDF
Crypto Loss And Recovery Guide By Expert Recovery Agency.
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
Module 1 - Introduction to Generative AI.pdf
PDF
CapCut PRO for PC Crack New Download (Fully Activated 2025)
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PPTX
Human-Computer Interaction for Lecture 1
PPTX
UNIT II: Software design, software .pptx
PDF
infoteam HELLAS company profile 2025 presentation
PPTX
Human-Computer Interaction for Lecture 2
PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
Folder Lock 10.1.9 Crack With Serial Key
A Spider Diagram, also known as a Radial Diagram or Mind Map.
3.Software Design for software engineering
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
ESDS_SAP Application Cloud Offerings.pptx
Chapter 1 - Transaction Processing and Mgt.pptx
Crypto Loss And Recovery Guide By Expert Recovery Agency.
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Module 1 - Introduction to Generative AI.pdf
CapCut PRO for PC Crack New Download (Fully Activated 2025)
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
HackYourBrain__UtrechtJUG__11092025.pptx
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Human-Computer Interaction for Lecture 1
UNIT II: Software design, software .pptx
infoteam HELLAS company profile 2025 presentation
Human-Computer Interaction for Lecture 2
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...

Apache Iceberg - A Table Format for Hige Analytic Datasets

  • 1. Apache Iceberg Ryan Blue 2019 Big Data Orchestration Summit
  • 3. ● Smarter processing engines ○ CBO, better join implementations ○ Result set caching, materialized views ● Reduce manual data maintenance ○ Data librarian services ○ Declarative instead of imperative 5-year Challenges
  • 4. ● Unsafe operations are everywhere ○ Writing to multiple partitions ○ Renaming a column ● Interaction with object stores causes major headaches ○ Eventual consistency to performance problems ○ Output committers can’t fix it ● Endless scale challenges Problem Whack-a-mole
  • 6. Iceberg is a scalable format for tables with a lot of best practices built in.
  • 7. A format? We already have Parquet, Avro and ORC . . .
  • 9. ● File formats help you modify or skip data in a single file ● Table formats do the same thing for a collection of files ● To demonstrate this, consider Hive tables . . . A table format
  • 10. ● Key idea: organize data in a directory tree date=20180513/ |- hour=18/ | |- ... |- hour=19/ | |- part-000.parquet | |- ... | |- part-031.parquet |- hour=20/ | |- ... Hive Tables
  • 11. ● Filter: WHERE date = '20180513' AND hour = 19 date=20180513/ |- hour=18/ | |- ... |- hour=19/ | |- part-000.parquet | |- ... | |- part-031.parquet |- hour=20/ | |- ... Hive Tables
  • 12. ● Problem: too much directory listing for large tables ● Solution: use HMS to track partitions date=20180513/hour=19 -> hdfs:/.../date=20180513/hour=19 date=20180513/hour=20 -> hdfs:/.../date=20180513/hour=20 ● The file system still tracks the files in each partition . . . Hive Metastore
  • 13. ● State is kept in both the metastore and in a file system ● Changes are not atomic without locking ● Requires directory listing ○ O(n) listing calls, n = # matching partitions ○ Eventual consistency breaks correctness Hive Tables: Problems
  • 14. ● Everything supports Hive tables* ○ Engines: Hive, Spark, Presto, Flink, Pig ○ Tools: Hudi, NiFi, Flume, Sqoop ● Simplicity and ubiquity have made Hive tables indispensable ● The whole ecosystem uses the same at-rest data! Hive Tables: Benefits
  • 16. ● An open spec and community for at-rest data interchange ○ Maintain a clear spec for the format ○ Design for multiple implementations across languages ○ Support needs across projects to avoid fragmentation Iceberg’s Goals
  • 17. ● Improve scale and reliability ○ Work on a single node, scale to a cluster ○ All changes are atomic, with serializable isolation ○ Native support for cloud object stores ○ Support many concurrent writers Iceberg’s Goals
  • 18. ● Fix persistent usability problems ○ In-place evolution for schema and layout (no side-effects) ○ Hide partitioning: insulate queries from physical layout ○ Support time-travel, rollback, and metadata inspection ○ Configure tables, not jobs ● Tables should have no unpleasant surprises Iceberg’s Goals
  • 19. ● Key idea: track all files in a table over time ○ A snapshot is a complete list of files in a table ○ Each write produces and commits a new snapshot Iceberg’s Design S1 S2 S3 ...
  • 20. S1 S2 S3 ... R W ● Readers use the current snapshot ● Writers optimistically create new snapshots, then commit Iceberg’s Design
  • 21. In reality, it’s a bit more complicated.
  • 22. ● All changes are atomic ● No expensive (or inconsistent) file system operations ● Snapshots are indexed for scan planning on a single node ● CBO metrics are reliable ● Versions for incremental updates and materialized views Iceberg Design Benefits
  • 24. ● Production tables: tens of petabytes, millions of partitions ○ Scan planning fits on a single node ○ Advanced filtering enables more use cases ○ Overall performance is better ● Low latency queries are faster for large tables Scale
  • 25. ● Production Flink pipeline writing in 3 AWS regions ● Lift service moving data into a single region ● Merge service compacting small files Concurrency
  • 26. ● Rollback is popular ● Metadata tables ○ Track down the version a job read ○ Find the process that wrote a bad version Usability
  • 27. ● Spark vectorization for faster bulk reads ○ Presto vectorization already done ● Row-level delete encodings ○ MERGE INTO ○ ID equality predicates Future Work