AI Functions in DBSQL

Skill: databricks-dbsql

What You Can Build

Databricks SQL exposes Foundation Model APIs, external HTTP endpoints, federated database queries, and file ingestion as built-in SQL functions. You can classify, extract, summarize, and enrich data in the same query that joins federated sources and calls external APIs — all without leaving your SQL warehouse.

In Action

“Write a SQL query that classifies support tickets, runs sentiment analysis, and extracts structured entities in a single pass.”

SELECT
  ticket_id,
  ai_classify(description, ARRAY('billing', 'technical', 'account')) AS category,
  ai_analyze_sentiment(description) AS sentiment,
  ai_extract(content, ARRAY('person_name', 'company', 'dollar_amount')) AS entities
FROM catalog.schema.support_tickets
LIMIT 100;

Key decisions:

AI functions work on table columns as naturally as UPPER() or LENGTH()
Each function calls a pre-configured Foundation Model API — no endpoint selection needed
LIMIT 100 during development controls token costs on Foundation Model APIs
Requires a Databricks SQL warehouse (not Classic) or DBR 15.1+

More Patterns

Get structured JSON output with ai_query

“Write a SQL query that uses ai_query to extract structured feedback analysis with topic, sentiment, and action items.”

SELECT ai_query(
  'databricks-meta-llama-3-3-70b-instruct',
  concat('Summarize as JSON with keys: topic, sentiment, action_items. Feedback: ', feedback),
  returnType => 'STRUCT<topic STRING, sentiment STRING, action_items ARRAY<STRING>>'
) AS analysis
FROM catalog.schema.customer_feedback LIMIT 50;

returnType tells ai_query to parse the model response into a typed Spark struct. This gives you strongly typed columns you can filter, join, and aggregate downstream without manual JSON parsing.

Call external APIs with http_request

“Write a SQL query that validates orders against an external API using an HTTP connection.”

CREATE CONNECTION my_api_conn TYPE HTTP
  OPTIONS (host 'https://api.example.com', bearer_token secret('scope', 'token'));

SELECT order_id, http_request(
  conn => 'my_api_conn', method => 'POST', path => '/v1/validate',
  json => to_json(named_struct('order_id', order_id, 'amount', amount))
).text AS api_response
FROM catalog.schema.orders WHERE needs_validation = true;

http_request calls external services through Unity Catalog HTTP connections. The connection stores credentials securely via secret(). The function returns a STRUCT<status_code: INT, text: STRING> so you can parse responses with from_json.

Run federated queries against external databases

“Write a SQL query that reads customer data from a PostgreSQL database using Lakehouse Federation.”

SELECT * FROM remote_query(
  'my_postgres_connection',
  database => 'my_database',
  query => 'SELECT customer_id, email FROM customers WHERE active = true'
);

remote_query pushes SQL to an external database and returns results as a table. Supported databases include PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery, and Redshift. Queries are read-only and pushed down for efficient execution.

Ingest raw files from volumes

“Write a SQL query that reads JSON event files from a Unity Catalog volume with schema hints.”

SELECT * FROM read_files(
  '/Volumes/catalog/schema/raw/events/',
  format => 'json',
  schemaHints => 'event_id STRING, timestamp TIMESTAMP, payload MAP<STRING, STRING>',
  pathGlobFilter => '*.json',
  recursiveFileLookup => true
);

read_files handles CSV, JSON, Parquet, Avro, ORC, XML, and binary formats. Use schemaHints to override specific inferred column types while letting the rest auto-detect.

Watch Out For

Using AI functions on a Classic SQL warehouse — they require a Serverless SQL Warehouse. Classic warehouses return “function not found” errors.
Forgetting to test http_request connections — network connectivity, secret scopes, and OAuth configuration all need to work before the query runs. Test with a simple GET request first.
Running remote_query without understanding pushdown — filters, limits, and aggregations are pushed to the remote database by default. But complex expressions may not push down, causing full table scans over the network. Check the query plan.
Skipping LIMIT during AI function development — every row incurs Foundation Model API token costs. Start small, validate output, then scale to full tables.