Skip to content

Managed Iceberg Tables

Skill: databricks-iceberg

Managed Iceberg tables are native Apache Iceberg tables stored in Unity Catalog. You get full read/write from Databricks and from any external engine via the Iceberg REST Catalog endpoint. Use them when you need a true open table format that works across platforms — Spark, PyIceberg, Snowflake, Trino, Flink — without the limitations of Delta-only features.

“Write SQL to create a managed Iceberg table for order events, partitioned by order_date, and load data from a staging table.”

CREATE TABLE analytics.gold.order_events
USING ICEBERG
PARTITIONED BY (order_date)
AS SELECT * FROM analytics.staging.raw_orders;

Key decisions:

  • USING ICEBERG makes this a native Iceberg table, not Delta — external engines can read and write via IRC without conversion
  • PARTITIONED BY over CLUSTER BY — both produce Liquid Clustering under the hood, but PARTITIONED BY auto-disables deletion vectors and row tracking, and works on any engine (EMR, OSS Spark, Trino), not just Databricks
  • Only plain column references work in PARTITIONED BY — expression transforms like bucket(), years(), months() are not supported
  • If you must use CLUSTER BY on Iceberg v2, manually set 'delta.enableDeletionVectors' = false and 'delta.enableRowTracking' = false in TBLPROPERTIES, or you get MANAGED_ICEBERG_ATTEMPTED_TO_ENABLE_CLUSTERING_WITHOUT_DISABLING_DVS_OR_ROW_TRACKING

“Write SQL to upsert incoming event data into a managed Iceberg table, matching on event_id.”

MERGE INTO analytics.gold.order_events AS target
USING analytics.staging.new_events AS source
ON target.event_id = source.event_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Managed Iceberg tables support all standard DML — INSERT, UPDATE, DELETE, and MERGE. This is one of the advantages over UniForm, where the table is read-only from external engines.

“Write SQL to create an Iceberg v3 table that stores semi-structured event payloads as VARIANT, clustered by event_date.”

CREATE TABLE analytics.gold.events_v3 (
event_id BIGINT,
event_date DATE,
payload VARIANT
)
USING ICEBERG
TBLPROPERTIES ('format-version' = '3')
CLUSTER BY (event_date);

Iceberg v3 (Beta, DBR 17.3+) adds deletion vectors, the VARIANT type, and row lineage. On v3, CLUSTER BY works without manually disabling DVs or row tracking. The tradeoff: external engines must use Iceberg library 1.9.0+, and you cannot downgrade back to v2 once upgraded.

“Write SQL to convert an existing Delta table to a managed Iceberg table using DEEP CLONE.”

CREATE TABLE analytics.gold.events_iceberg
USING ICEBERG
DEEP CLONE analytics.gold.events_delta;

DEEP CLONE copies data files into a new managed Iceberg table. SHALLOW CLONE is not supported for Iceberg. For foreign Iceberg tables (from Snowflake or another catalog), use CTAS instead:

CREATE TABLE analytics.gold.events_managed
USING ICEBERG
PARTITIONED BY (event_date)
AS SELECT * FROM foreign_catalog.schema.events;

“Write SQL to read the order_events table as it existed at a specific timestamp.”

SELECT * FROM analytics.gold.order_events
TIMESTAMP AS OF '2025-06-01T00:00:00Z';
-- Or by snapshot ID
SELECT * FROM analytics.gold.order_events
VERSION AS OF 1234567890;

Time travel works the same as Delta. External engines can also query the snapshot history metadata table via table_name.snapshots.

  • Never install an Iceberg library into DBR — Databricks Runtime includes built-in Iceberg support. Adding a JAR causes version conflicts and class-loading errors.
  • Never set write.metadata.path or write.metadata.previous-versions-max — Databricks manages metadata locations automatically. Overriding these causes metadata corruption.
  • Default compression is zstd, not snappy — older Iceberg readers that do not support zstd will fail silently. Check reader compatibility or set write.parquet.compression-codec to snappy in TBLPROPERTIES.
  • No Change Data Feed (CDF) on Iceberg tables — CDF is Delta-only. If you need CDF for downstream streaming tables, use Delta + UniForm instead.