Materialized Views & Pipe Syntax
Skill: databricks-dbsql
What You Can Build
Section titled “What You Can Build”Materialized views store precomputed query results as Delta tables that update on a schedule, on upstream data changes, or on demand. Pipe syntax chains SQL operations top-down using the |> operator, eliminating nested subqueries. Together they give you fast-read analytics with readable transformation logic.
In Action
Section titled “In Action”“Write SQL that creates a materialized view of daily revenue by region with hourly auto-refresh.”
CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.daily_revenue CLUSTER BY (order_date) SCHEDULE EVERY 1 HOUR COMMENT 'Hourly-refreshed daily revenue by region'AS SELECT order_date, region, SUM(amount) AS total_revenue, COUNT(DISTINCT customer_id) AS unique_customersFROM catalog.schema.fact_ordersJOIN catalog.schema.dim_store USING (store_id)GROUP BY order_date, region;Key decisions:
SCHEDULE EVERY 1 HOURcreates a Databricks Job that refreshes automatically (valid intervals: 1-72 hours, 1-31 days, 1-8 weeks)CLUSTER BYapplies Liquid Clustering to the materialized output for fast reads- The MV physically stores results as a Delta table — queries read precomputed data instead of rerunning the aggregation
- Requires a Unity Catalog-enabled Serverless SQL warehouse
More Patterns
Section titled “More Patterns”Refresh on upstream data changes
Section titled “Refresh on upstream data changes”“Write SQL that creates a materialized view which refreshes within 5 minutes of upstream table updates.”
CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.customer_orders TRIGGER ON UPDATE AT MOST EVERY INTERVAL 5 MINUTESAS SELECT c.customer_id, c.name, COUNT(o.order_id) AS order_countFROM catalog.schema.customers cJOIN catalog.schema.orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.name;TRIGGER ON UPDATE watches upstream tables for changes and refreshes the MV automatically. The AT MOST EVERY INTERVAL 5 MINUTES throttle prevents excessive refreshes from rapid upstream writes. Maximum 10 upstream source tables and 1,000 trigger-based MVs per workspace.
Use temporary tables and views for session work
Section titled “Use temporary tables and views for session work”“Write SQL that creates a temporary table for intermediate results and a temporary view as a lightweight query alias.”
-- Temporary table (physical storage, supports DML)CREATE TEMP TABLE temp_active_usersAS SELECT user_id, username, last_loginFROM catalog.schema.usersWHERE last_login > current_date() - INTERVAL 30 DAYS;
-- Temporary view (logical alias, recomputed each access)CREATE OR REPLACE TEMPORARY VIEW active_customersAS SELECT customer_id, name, emailFROM catalog.schema.customersWHERE status = 'active';Temporary tables persist data for up to 7 days or until session end — use them when you need repeated reads or DML (INSERT, UPDATE, MERGE). Temporary views are zero-cost query aliases that recompute on every access. They share a namespace: you can’t create both with the same name.
Chain transformations with pipe syntax
Section titled “Chain transformations with pipe syntax”“Write a SQL query using pipe syntax that joins orders with products and customers, computes line totals, and aggregates revenue by region and category.”
FROM catalog.schema.orders |> AS o |> JOIN catalog.schema.products p ON o.product_id = p.product_id |> JOIN catalog.schema.customers c ON o.customer_id = c.customer_id |> WHERE o.order_date >= '2025-01-01' |> EXTEND o.quantity * p.unit_price AS line_total |> AGGREGATE SUM(line_total) AS total_revenue, COUNT(DISTINCT o.order_id) AS order_count GROUP BY c.region, p.category |> ORDER BY total_revenue DESC |> LIMIT 20;Pipe syntax reads top-down instead of inside-out. EXTEND adds columns (like PySpark’s withColumn), SET modifies existing columns, DROP removes them. It compiles to the same execution plan as traditional SQL — zero performance difference. Available in DBR 16.2+.
Debug queries by commenting out tail operations
Section titled “Debug queries by commenting out tail operations”“Show how pipe syntax makes it easy to inspect intermediate results during development.”
FROM catalog.schema.sales |> WHERE region = 'US' |> EXTEND amount * tax_rate AS tax_amount -- |> AGGREGATE SUM(tax_amount) AS total_tax GROUP BY state -- |> ORDER BY total_tax DESC;Comment out the last pipe operations to inspect intermediate results at any stage of the pipeline. With traditional SQL, you’d need to restructure the entire query or wrap it in a CTE.
Watch Out For
Section titled “Watch Out For”- Using
SELECTwith aggregate functions in pipe syntax — pipe syntax separates projection from aggregation. UseAGGREGATE ... GROUP BYinstead ofSELECT SUM(x) ... GROUP BY. Putting aggregates inSELECTinside a pipe chain raises an error. - Expecting incremental refresh without row tracking — MVs fall back to full refresh unless source tables have
delta.enableRowTracking = true. Enable it on source tables if refresh cost matters. - Creating MVs on Classic SQL warehouses — materialized views require Serverless SQL warehouses. Creation silently triggers a serverless pipeline regardless of your warehouse type, but the pipeline won’t start if serverless isn’t available.
- Assuming
OPTIMIZEandVACUUMwork on MVs — they’re managed automatically. Running these commands on a materialized view raises an error.