Skip to content

Managed Iceberg Tables

Skill: databricks-iceberg

Managed Iceberg tables give you native Apache Iceberg storage inside Unity Catalog. You get full DML support in Databricks and read/write access from external engines via the Iceberg REST Catalog — no format conversion, no metadata sync delays. When you need a single table that Spark, PyIceberg, Snowflake, and Flink can all hit directly, this is the path.

“Using SQL, create a managed Iceberg v2 table for an orders dataset with Liquid Clustering on order_date. Use PARTITIONED BY for cross-platform compatibility.”

CREATE TABLE analytics.sales.orders (
order_id BIGINT,
customer_id BIGINT,
amount DECIMAL(10,2),
region STRING,
order_date DATE
)
USING ICEBERG
PARTITIONED BY (order_date);

Key decisions:

  • USING ICEBERG makes this a native Iceberg table, not Delta with UniForm
  • PARTITIONED BY auto-handles deletion vector and row-tracking properties — no manual TBLPROPERTIES needed on v2 or v3
  • Both PARTITIONED BY and CLUSTER BY produce the same Iceberg metadata for external engines — UC maps both to Liquid Clustering keys exposed as Iceberg partition fields
  • PARTITIONED BY is standard Iceberg DDL that any engine can create; CLUSTER BY is Databricks-only

“Using SQL, create a managed Iceberg table from raw events, clustered on event_date, targeting Iceberg v3.”

CREATE TABLE analytics.bronze.events_archive
USING ICEBERG
TBLPROPERTIES ('format-version' = '3')
CLUSTER BY (event_date)
AS SELECT * FROM analytics.staging.raw_events
WHERE event_date < '2025-01-01';

On v3, CLUSTER BY works without extra TBLPROPERTIES because deletion vectors and row tracking are handled natively. On v2, you’d need to explicitly set 'delta.enableDeletionVectors' = false and 'delta.enableRowTracking' = false.

“Using SQL, write a MERGE statement that upserts staged order data into a managed Iceberg table.”

MERGE INTO analytics.sales.orders AS target
USING analytics.staging.new_orders AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Managed Iceberg tables support full MERGE, INSERT, UPDATE, and DELETE. The one DML gap: Change Data Feed (CDF) is Delta-only. If you need CDF, use Delta with UniForm instead.

“Using SQL, query an Iceberg table as it existed at a specific point in time.”

-- By timestamp
SELECT * FROM analytics.sales.orders
TIMESTAMP AS OF '2025-06-01T00:00:00Z';
-- By snapshot ID
SELECT * FROM analytics.sales.orders
VERSION AS OF 1234567890;

Time travel works against Iceberg snapshots. External engines can also inspect snapshot history via SELECT * FROM analytics.sales.orders.snapshots.

“Using SQL, deep-clone an existing Delta table into a managed Iceberg table with Liquid Clustering.”

CREATE TABLE analytics.sales.orders_iceberg
USING ICEBERG
TBLPROPERTIES ('format-version' = '3')
CLUSTER BY (order_date)
DEEP CLONE analytics.sales.orders_delta;

DEEP CLONE copies all data into native Iceberg format. SHALLOW CLONE is not supported for Iceberg tables — always use DEEP CLONE or CTAS.

  • Expression transforms not supportedbucket(), years(), months(), days(), and hours() don’t work with PARTITIONED BY on managed Iceberg tables. Only plain column references are allowed.
  • CLUSTER BY on v2 requires manual TBLPROPERTIES — forgetting 'delta.enableDeletionVectors' = false and 'delta.enableRowTracking' = false produces a MANAGED_ICEBERG_ATTEMPTED_TO_ENABLE_CLUSTERING_WITHOUT_DISABLING_DVS_OR_ROW_TRACKING error. Use PARTITIONED BY to avoid this entirely.
  • No CDF on Iceberg — Change Data Feed is a Delta feature. If downstream consumers depend on CDF, keep the table as Delta and add UniForm for external Iceberg reads.
  • Do not install an Iceberg library into DBR — Databricks Runtime includes built-in Iceberg support. Adding an Iceberg JAR causes version conflicts.
  • v3 is irreversible — once you upgrade a table to format-version 3, you cannot downgrade. External engines must use Iceberg library 1.9.0+ to read v3 tables.
  • Enable Predictive Optimization — it’s recommended but not auto-enabled for managed Iceberg tables. Turn it on at the catalog or schema level to get automatic compaction, vacuum, and statistics collection.