Databricks SQL
Skill: databricks-dbsql
What You Can Build
Section titled “What You Can Build”You can build production SQL pipelines using stored procedures with error handling, materialized views with scheduled refresh, geospatial queries with H3 indexing, and the pipe operator for readable transformations. Ask your AI coding assistant to scaffold any of these patterns and it will generate warehouse-ready SQL with proper three-level naming, Liquid Clustering, and collation support.
In Action
Section titled “In Action”“Create a stored procedure that upserts staging customers into the dimension table with full error handling. It should take the source table name as input and return the row count.”
CREATE OR REPLACE PROCEDURE catalog.schema.upsert_customers( IN p_source STRING, OUT p_rows_affected INT)LANGUAGE SQLSQL SECURITY INVOKERBEGIN DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN SET p_rows_affected = -1; SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = concat('Upsert failed for source: ', p_source); END;
MERGE INTO catalog.schema.dim_customer AS t USING (SELECT * FROM identifier(p_source)) AS s ON t.customer_id = s.customer_id WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *;
SET p_rows_affected = (SELECT COUNT(*) FROM identifier(p_source));END;
-- Invoke:CALL catalog.schema.upsert_customers('catalog.schema.staging_customers', ?);Key decisions:
SQL SECURITY INVOKER— the procedure runs with the caller’s permissions, not the creator’s. This is the safe default for shared procedures in Unity Catalog.identifier(p_source)for dynamic table names — avoids string concatenation and SQL injection. Theidentifier()function resolves a string parameter to a table reference at execution time.DECLARE EXIT HANDLER— catches any SQL exception inside the block. Without it, a failed MERGE silently aborts and you never know it happened.OUTparameter for row count — lets the caller inspect results without parsing log output. The?placeholder in the CALL statement captures the output.
More Patterns
Section titled “More Patterns”Materialized view with scheduled refresh
Section titled “Materialized view with scheduled refresh”“Create an hourly-refreshed materialized view for daily revenue by region, clustered by order date.”
CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.daily_revenue CLUSTER BY (order_date) SCHEDULE EVERY 1 HOUR COMMENT 'Hourly-refreshed daily revenue by region'AS SELECT order_date, region, SUM(amount) AS total_revenue, COUNT(DISTINCT customer_id) AS unique_customersFROM catalog.schema.fact_ordersJOIN catalog.schema.dim_store USING (store_id)GROUP BY order_date, region;Materialized views require a Pro or Serverless warehouse. SCHEDULE EVERY 1 HOUR runs incremental refresh automatically. CLUSTER BY applies Liquid Clustering to the materialized output, which speeds up downstream BI queries that filter on date.
Pipe syntax for readable transformations
Section titled “Pipe syntax for readable transformations”“Rewrite my 30-day sales analysis query using pipe syntax so it reads top to bottom.”
FROM catalog.schema.fact_orders |> WHERE order_date >= current_date() - INTERVAL 30 DAYS |> AGGREGATE SUM(amount) AS total, COUNT(*) AS cnt GROUP BY region, product_category |> WHERE total > 10000 |> ORDER BY total DESC |> LIMIT 20;Pipe syntax (DBR 16.1+) inverts the traditional SQL reading order. Each |> stage feeds into the next, making complex transformations scannable without nesting subqueries. The second WHERE filters the aggregated results — equivalent to HAVING, but clearer because it appears after the aggregation.
Geospatial proximity search with H3
Section titled “Geospatial proximity search with H3”“Find all stores within 5km of each customer using H3 spatial indexing.”
WITH customer_h3 AS ( SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell FROM catalog.schema.customers),store_h3 AS ( SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell FROM catalog.schema.stores)SELECT c.customer_id, s.store_id, ST_Distance( ST_Point(c.longitude, c.latitude), ST_Point(s.longitude, s.latitude) ) AS distance_mFROM customer_h3 cJOIN store_h3 s ON h3_ischildof(c.h3_cell, h3_toparent(s.h3_cell, 5))WHERE ST_Distance( ST_Point(c.longitude, c.latitude), ST_Point(s.longitude, s.latitude)) < 5000;H3 indexing at resolution 7 gives roughly 1.2km hex cells. The join on h3_ischildof prunes the search space before the expensive ST_Distance calculation, turning an O(n*m) cross join into a fast spatial lookup.
Watch Out For
Section titled “Watch Out For”- Classic warehouses for AI and MV features — materialized views,
http_request, and AI functions all require a Pro or Serverless SQL warehouse. Classic warehouses silently fail or throw cryptic errors. PARTITION BYon new tables — useCLUSTER BY(Liquid Clustering) instead. Partitioning is the legacy pattern. Liquid Clustering auto-optimizes file layout and supports up to 4 clustering columns with no partition pruning overhead.- Missing
LIMITwith AI functions in DBSQL — same rule as anywhere: eachai_classify()orai_extract()call hits a foundation model. Cap results during development. - Case sensitivity in string columns — if your users search for “MacBook” and the data stores “macbook”, add
COLLATE UTF8_LCASEto the column definition. It makes all comparisons case-insensitive without wrapping every query inLOWER().