Skip to content

AI Functions in DBSQL

Skill: databricks-dbsql

Databricks SQL exposes Foundation Model APIs, external HTTP endpoints, federated database queries, and file ingestion as built-in SQL functions. You can classify, extract, summarize, and enrich data in the same query that joins federated sources and calls external APIs — all without leaving your SQL warehouse.

“Write a SQL query that classifies support tickets, runs sentiment analysis, and extracts structured entities in a single pass.”

SELECT
ticket_id,
ai_classify(description, ARRAY('billing', 'technical', 'account')) AS category,
ai_analyze_sentiment(description) AS sentiment,
ai_extract(content, ARRAY('person_name', 'company', 'dollar_amount')) AS entities
FROM catalog.schema.support_tickets
LIMIT 100;

Key decisions:

  • AI functions work on table columns as naturally as UPPER() or LENGTH()
  • Each function calls a pre-configured Foundation Model API — no endpoint selection needed
  • LIMIT 100 during development controls token costs on Foundation Model APIs
  • Requires a Databricks SQL warehouse (not Classic) or DBR 15.1+

“Write a SQL query that uses ai_query to extract structured feedback analysis with topic, sentiment, and action items.”

SELECT ai_query(
'databricks-meta-llama-3-3-70b-instruct',
concat('Summarize as JSON with keys: topic, sentiment, action_items. Feedback: ', feedback),
returnType => 'STRUCT<topic STRING, sentiment STRING, action_items ARRAY<STRING>>'
) AS analysis
FROM catalog.schema.customer_feedback LIMIT 50;

returnType tells ai_query to parse the model response into a typed Spark struct. This gives you strongly typed columns you can filter, join, and aggregate downstream without manual JSON parsing.

“Write a SQL query that validates orders against an external API using an HTTP connection.”

CREATE CONNECTION my_api_conn TYPE HTTP
OPTIONS (host 'https://api.example.com', bearer_token secret('scope', 'token'));
SELECT order_id, http_request(
conn => 'my_api_conn', method => 'POST', path => '/v1/validate',
json => to_json(named_struct('order_id', order_id, 'amount', amount))
).text AS api_response
FROM catalog.schema.orders WHERE needs_validation = true;

http_request calls external services through Unity Catalog HTTP connections. The connection stores credentials securely via secret(). The function returns a STRUCT<status_code: INT, text: STRING> so you can parse responses with from_json.

Run federated queries against external databases

Section titled “Run federated queries against external databases”

“Write a SQL query that reads customer data from a PostgreSQL database using Lakehouse Federation.”

SELECT * FROM remote_query(
'my_postgres_connection',
database => 'my_database',
query => 'SELECT customer_id, email FROM customers WHERE active = true'
);

remote_query pushes SQL to an external database and returns results as a table. Supported databases include PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery, and Redshift. Queries are read-only and pushed down for efficient execution.

“Write a SQL query that reads JSON event files from a Unity Catalog volume with schema hints.”

SELECT * FROM read_files(
'/Volumes/catalog/schema/raw/events/',
format => 'json',
schemaHints => 'event_id STRING, timestamp TIMESTAMP, payload MAP<STRING, STRING>',
pathGlobFilter => '*.json',
recursiveFileLookup => true
);

read_files handles CSV, JSON, Parquet, Avro, ORC, XML, and binary formats. Use schemaHints to override specific inferred column types while letting the rest auto-detect.

  • Using AI functions on a Classic SQL warehouse — they require a Serverless SQL Warehouse. Classic warehouses return “function not found” errors.
  • Forgetting to test http_request connections — network connectivity, secret scopes, and OAuth configuration all need to work before the query runs. Test with a simple GET request first.
  • Running remote_query without understanding pushdown — filters, limits, and aggregations are pushed to the remote database by default. But complex expressions may not push down, causing full table scans over the network. Check the query plan.
  • Skipping LIMIT during AI function development — every row incurs Foundation Model API token costs. Start small, validate output, then scale to full tables.