Materialized Views & Pipe Syntax

Skill: databricks-dbsql

What You Can Build

Materialized views store precomputed query results as Delta tables that update on a schedule, on upstream data changes, or on demand. Pipe syntax chains SQL operations top-down using the |> operator, eliminating nested subqueries. Together they give you fast-read analytics with readable transformation logic.

In Action

“Write SQL that creates a materialized view of daily revenue by region with hourly auto-refresh.”

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.daily_revenue
  CLUSTER BY (order_date)
  SCHEDULE EVERY 1 HOUR
  COMMENT 'Hourly-refreshed daily revenue by region'
AS SELECT
    order_date,
    region,
    SUM(amount) AS total_revenue,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM catalog.schema.fact_orders
JOIN catalog.schema.dim_store USING (store_id)
GROUP BY order_date, region;

Key decisions:

SCHEDULE EVERY 1 HOUR creates a Databricks Job that refreshes automatically (valid intervals: 1-72 hours, 1-31 days, 1-8 weeks)
CLUSTER BY applies Liquid Clustering to the materialized output for fast reads
The MV physically stores results as a Delta table — queries read precomputed data instead of rerunning the aggregation
Requires a Unity Catalog-enabled Serverless SQL warehouse

More Patterns

Refresh on upstream data changes

“Write SQL that creates a materialized view which refreshes within 5 minutes of upstream table updates.”

CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.customer_orders
  TRIGGER ON UPDATE AT MOST EVERY INTERVAL 5 MINUTES
AS SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_count
FROM catalog.schema.customers c
JOIN catalog.schema.orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

TRIGGER ON UPDATE watches upstream tables for changes and refreshes the MV automatically. The AT MOST EVERY INTERVAL 5 MINUTES throttle prevents excessive refreshes from rapid upstream writes. Maximum 10 upstream source tables and 1,000 trigger-based MVs per workspace.

Use temporary tables and views for session work

“Write SQL that creates a temporary table for intermediate results and a temporary view as a lightweight query alias.”

-- Temporary table (physical storage, supports DML)
CREATE TEMP TABLE temp_active_users
AS SELECT user_id, username, last_login
FROM catalog.schema.users
WHERE last_login > current_date() - INTERVAL 30 DAYS;

-- Temporary view (logical alias, recomputed each access)
CREATE OR REPLACE TEMPORARY VIEW active_customers
AS SELECT customer_id, name, email
FROM catalog.schema.customers
WHERE status = 'active';

Temporary tables persist data for up to 7 days or until session end — use them when you need repeated reads or DML (INSERT, UPDATE, MERGE). Temporary views are zero-cost query aliases that recompute on every access. They share a namespace: you can’t create both with the same name.

Chain transformations with pipe syntax

“Write a SQL query using pipe syntax that joins orders with products and customers, computes line totals, and aggregates revenue by region and category.”

FROM catalog.schema.orders
  |> AS o
  |> JOIN catalog.schema.products p ON o.product_id = p.product_id
  |> JOIN catalog.schema.customers c ON o.customer_id = c.customer_id
  |> WHERE o.order_date >= '2025-01-01'
  |> EXTEND o.quantity * p.unit_price AS line_total
  |> AGGREGATE
       SUM(line_total) AS total_revenue,
       COUNT(DISTINCT o.order_id) AS order_count
     GROUP BY c.region, p.category
  |> ORDER BY total_revenue DESC
  |> LIMIT 20;

Pipe syntax reads top-down instead of inside-out. EXTEND adds columns (like PySpark’s withColumn), SET modifies existing columns, DROP removes them. It compiles to the same execution plan as traditional SQL — zero performance difference. Available in DBR 16.2+.

Debug queries by commenting out tail operations

“Show how pipe syntax makes it easy to inspect intermediate results during development.”

FROM catalog.schema.sales
  |> WHERE region = 'US'
  |> EXTEND amount * tax_rate AS tax_amount
  -- |> AGGREGATE SUM(tax_amount) AS total_tax GROUP BY state
  -- |> ORDER BY total_tax DESC
;

Comment out the last pipe operations to inspect intermediate results at any stage of the pipeline. With traditional SQL, you’d need to restructure the entire query or wrap it in a CTE.

Watch Out For

Using SELECT with aggregate functions in pipe syntax — pipe syntax separates projection from aggregation. Use AGGREGATE ... GROUP BY instead of SELECT SUM(x) ... GROUP BY. Putting aggregates in SELECT inside a pipe chain raises an error.
Expecting incremental refresh without row tracking — MVs fall back to full refresh unless source tables have delta.enableRowTracking = true. Enable it on source tables if refresh cost matters.
Creating MVs on Classic SQL warehouses — materialized views require Serverless SQL warehouses. Creation silently triggers a serverless pipeline regardless of your warehouse type, but the pipeline won’t start if serverless isn’t available.
Assuming OPTIMIZE and VACUUM work on MVs — they’re managed automatically. Running these commands on a materialized view raises an error.