Skip to content

SQL Scripting & Stored Procedures

Skill: databricks-dbsql

SQL scripting brings procedural logic to Databricks SQL — variables, control flow, exception handling, and stored procedures that persist in Unity Catalog. You can write conditional ETL pipelines, reusable data processing routines, and recursive queries for hierarchy traversal, all without leaving SQL. Available in DBR 16.3+ and Databricks SQL.

“Write a SQL script that checks for new orders, processes them into a target table with a timestamp, and returns the status and row count.”

BEGIN
DECLARE v_count INT;
DECLARE v_status STRING DEFAULT 'pending';
SET v_count = (SELECT COUNT(*) FROM catalog.schema.raw_orders WHERE status = 'new');
IF v_count > 0 THEN
INSERT INTO catalog.schema.processed_orders
SELECT *, current_timestamp() AS processed_at
FROM catalog.schema.raw_orders
WHERE status = 'new';
SET v_status = 'completed';
ELSE
SET v_status = 'skipped';
END IF;
SELECT v_status AS result, v_count AS rows_processed;
END

Key decisions:

  • Every SQL script starts with a BEGIN...END compound statement
  • Declarations must appear before executable statements — variables first, then conditions, then handlers
  • SET assigns values from scalar subqueries or expressions
  • The compound statement must be the sole statement in a notebook cell
  • Variables are scoped to their enclosing BEGIN...END block

Create a reusable stored procedure with error handling

Section titled “Create a reusable stored procedure with error handling”

“Write a stored procedure that upserts customer data with MERGE, returns the affected row count, and handles failures gracefully.”

CREATE OR REPLACE PROCEDURE catalog.schema.upsert_customers(
IN p_source STRING,
OUT p_rows_affected INT
)
LANGUAGE SQL
SQL SECURITY INVOKER
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET p_rows_affected = -1;
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = concat('Upsert failed for source: ', p_source);
END;
MERGE INTO catalog.schema.dim_customer AS t
USING (SELECT * FROM identifier(p_source)) AS s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
SET p_rows_affected = (SELECT COUNT(*) FROM identifier(p_source));
END;
-- Invoke:
CALL catalog.schema.upsert_customers('catalog.schema.staging_customers', ?);

Stored procedures persist in Unity Catalog and are invoked with CALL. The EXIT HANDLER FOR SQLEXCEPTION catches any error inside the procedure body, sets the output to -1, and raises a custom error message. Available in DBR 17.0+ (Public Preview).

Traverse an organizational hierarchy with recursive CTEs

Section titled “Traverse an organizational hierarchy with recursive CTEs”

“Write a SQL query that walks an employee hierarchy from the CEO down, tracking depth and the full path.”

WITH RECURSIVE org_chart AS (
SELECT employee_id, name, manager_id, 0 AS depth, ARRAY(name) AS path
FROM catalog.schema.employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.name, e.manager_id, o.depth + 1, array_append(o.path, e.name)
FROM catalog.schema.employees e
JOIN org_chart o ON e.manager_id = o.employee_id
WHERE o.depth < 10
)
SELECT * FROM org_chart ORDER BY depth, name;

Recursive CTEs need an anchor query (seed rows) joined to a recursive step via UNION ALL. The WHERE o.depth < 10 acts as a safety valve. Default recursion depth is 100 (override with MAX RECURSION LEVEL), and the default row limit is 1,000,000. Available in DBR 17.0+ and DBSQL 2025.20+.

Execute atomic multi-statement transactions

Section titled “Execute atomic multi-statement transactions”

“Write a SQL script that inserts a customer, their first order, and an audit log entry as a single atomic operation.”

BEGIN ATOMIC
INSERT INTO customers (id, name) VALUES (1, 'Alice');
INSERT INTO orders (id, customer_id, amount) VALUES (1, 1, 250.00);
INSERT INTO audit_log (action, ts) VALUES ('new_customer_order', current_timestamp());
END;

If any statement fails, all changes roll back. Tables used in BEGIN ATOMIC blocks must have the catalogManaged table feature enabled. This is a Preview feature.

Build dynamic SQL for parameterized operations

Section titled “Build dynamic SQL for parameterized operations”

“Write a SQL script that uses EXECUTE IMMEDIATE to run a parameterized query and capture the result.”

BEGIN
DECLARE total INT;
EXECUTE IMMEDIATE 'SELECT SUM(c1) FROM VALUES(:a), (:b) AS t(c1)'
INTO total USING (5 AS a, 6 AS b);
VALUES (total); -- Returns 11
END;

EXECUTE IMMEDIATE constructs SQL from strings at runtime. Use named parameters (:param) for clarity, and INTO to capture single-row results into declared variables. Available since DBR 14.3.

  • Mixing declarations with executable statements — declarations must come first in a BEGIN...END block. Variable declarations, then condition declarations, then handler declarations, then executable statements. Breaking this order causes syntax errors.
  • Forgetting WHERE guards on recursive CTEs — without a depth limit or termination condition, recursive queries hit the 100-level default and error out. Always add a WHERE depth < N clause.
  • Using BEGIN ATOMIC without the catalogManaged table feature — existing tables can’t be upgraded in place. You need to recreate them with TBLPROPERTIES ('delta.feature.catalogManaged' = 'supported').
  • Expecting CREATE OR REPLACE TEMP TABLE — it’s not supported yet. Drop the temp table first, then recreate it.