Iceberg Tables
Skill: databricks-iceberg
What You Can Build
Section titled “What You Can Build”You can create native Apache Iceberg tables that are fully readable and writable from both Databricks and external engines like Spark, PyIceberg, Snowflake, and Trino. You can also make existing Delta tables readable as Iceberg without migrating, and expose everything through the Iceberg REST Catalog so external tools connect with zero storage credential setup. Ask your AI coding assistant for the right pattern and it will pick between managed Iceberg, UniForm, or Compatibility Mode based on your requirements.
In Action
Section titled “In Action”“Create a managed Iceberg table for order events partitioned by order_date. I need external Spark clusters and PyIceberg clients to read and write this table via the Iceberg REST Catalog.”
-- Managed Iceberg table with cross-platform partitioning-- PARTITIONED BY is standard Iceberg DDL — works on EMR, OSS Spark, Trino, FlinkCREATE TABLE analytics.silver.order_events ( order_id BIGINT, customer_id STRING, amount DECIMAL(10,2), order_date DATE, region STRING)USING ICEBERGPARTITIONED BY (order_date);
-- Grant external engine access via the Iceberg REST CatalogGRANT EXTERNAL USE SCHEMA ON SCHEMA analytics.silver TO `data-platform`;
-- Load dataINSERT INTO analytics.silver.order_eventsSELECT * FROM analytics.bronze.raw_ordersWHERE order_date >= '2025-01-01';
-- Verify with time travelSELECT COUNT(*) FROM analytics.silver.order_events TIMESTAMP AS OF '2025-06-01T00:00:00Z';Key decisions:
PARTITIONED BYoverCLUSTER BY— standard Iceberg DDL that any engine can create and read. Both produce the same Liquid Clustering layout internally, butPARTITIONED BYauto-disables deletion vectors and row tracking so you skip manualTBLPROPERTIES.USING ICEBERG(not Delta + UniForm) — full read/write from external engines. UniForm is read-only externally.EXTERNAL USE SCHEMAgrant — separate fromSELECT/MODIFY. Without it, external engines get403 Forbiddeneven with valid credentials.- No Iceberg library install — DBR includes built-in support. Adding one causes version conflicts.
- No
write.metadata.pathoverride — Databricks manages metadata locations. Overriding causes corruption.
More Patterns
Section titled “More Patterns”Make an existing Delta table readable as Iceberg
Section titled “Make an existing Delta table readable as Iceberg”“Enable UniForm on my customers Delta table so Snowflake can read it without migrating off Delta.”
-- Disable deletion vectors first (required for UniForm)ALTER TABLE analytics.gold.customersSET TBLPROPERTIES ('delta.enableDeletionVectors' = 'false');
REORG TABLE analytics.gold.customers APPLY (PURGE);
-- Enable UniFormALTER TABLE analytics.gold.customersSET TBLPROPERTIES ( 'delta.columnMapping.mode' = 'name', 'delta.enableIcebergCompatV2' = 'true', 'delta.universalFormat.enabledFormats' = 'iceberg');The table stays Delta internally — you keep CDF, streaming, and all Delta features. Iceberg metadata is generated asynchronously after each write, so external engines see new data within seconds. This is read-only from the Iceberg side; writes still go through Delta.
Connect PyIceberg from outside Databricks
Section titled “Connect PyIceberg from outside Databricks”“Read and append to my Iceberg table from a local Python script using PyIceberg.”
from pyiceberg.catalog import load_catalogimport pyarrow as pa
catalog = load_catalog( "uc", uri="https://my-workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest", warehouse="analytics", token="dapi...",)
# Read with pushdown filtertbl = catalog.load_table("silver.order_events")df = tbl.scan( row_filter="order_date >= '2025-06-01'", limit=1000,).to_pandas()
# Append new rows — schema must match exactlyarrow_schema = pa.schema([ pa.field("order_id", pa.int64()), pa.field("customer_id", pa.string()), pa.field("amount", pa.decimal128(10, 2)), pa.field("order_date", pa.date32()), pa.field("region", pa.string()),])new_rows = pa.Table.from_pylist( [{"order_id": 9001, "customer_id": "C-42", "amount": 149.99, "order_date": "2025-06-15", "region": "us-west"}], schema=arrow_schema,)tbl.append(new_rows)The warehouse parameter pins the UC catalog, so table identifiers are schema.table (not three-level). Cast Arrow types explicitly — PyArrow defaults to int64, but if the Iceberg schema uses int32, the append silently corrupts data.
Enable Compatibility Mode for streaming tables
Section titled “Enable Compatibility Mode for streaming tables”“My SDP pipeline has streaming tables that Snowflake needs to read as Iceberg.”
CREATE OR REFRESH STREAMING TABLE analytics.silver.click_eventsTBLPROPERTIES ( 'delta.universalFormat.enabledFormats' = 'compatibility', 'delta.universalFormat.compatibility.location' = 's3://my-bucket/compat/click_events')AS SELECT * FROM STREAM read_files( '/Volumes/analytics/raw/clickstream/', format => 'json');Compatibility Mode writes a separate Iceberg-format copy to the external location. Regular UniForm does not work on streaming tables or materialized views — this is the only option. Factor in the extra storage cost for large tables.
Watch Out For
Section titled “Watch Out For”- Installing an Iceberg library on DBR — Databricks Runtime bundles Iceberg support. Adding a JAR causes class conflicts and cryptic errors. Remove any
iceberg-spark-runtimedependencies from your cluster libraries. CLUSTER BYon Iceberg v2 without disabling DVs — you getMANAGED_ICEBERG_ATTEMPTED_TO_ENABLE_CLUSTERING_WITHOUT_DISABLING_DVS_OR_ROW_TRACKING. Set'delta.enableDeletionVectors' = falseand'delta.enableRowTracking' = falsein TBLPROPERTIES, or usePARTITIONED BYwhich handles it automatically.- Expression transforms in
PARTITIONED BY—bucket(),years(),months(),days(),hours()are not supported on managed Iceberg tables. Only plain column references work. - UniForm async delay — Iceberg metadata generation is asynchronous. External engines may not see the latest data for a few seconds after a Delta write. Check status with
DESCRIBE EXTENDED table_namebefore blaming credentials.