AI Functions in DBSQL
Skill: databricks-dbsql
What You Can Build
Section titled “What You Can Build”Databricks SQL exposes Foundation Model APIs, external HTTP endpoints, federated database queries, and file ingestion as built-in SQL functions. You can classify, extract, summarize, and enrich data in the same query that joins federated sources and calls external APIs — all without leaving your SQL warehouse.
In Action
Section titled “In Action”“Write a SQL query that classifies support tickets, runs sentiment analysis, and extracts structured entities in a single pass.”
SELECT ticket_id, ai_classify(description, ARRAY('billing', 'technical', 'account')) AS category, ai_analyze_sentiment(description) AS sentiment, ai_extract(content, ARRAY('person_name', 'company', 'dollar_amount')) AS entitiesFROM catalog.schema.support_ticketsLIMIT 100;Key decisions:
- AI functions work on table columns as naturally as
UPPER()orLENGTH() - Each function calls a pre-configured Foundation Model API — no endpoint selection needed
LIMIT 100during development controls token costs on Foundation Model APIs- Requires a Databricks SQL warehouse (not Classic) or DBR 15.1+
More Patterns
Section titled “More Patterns”Get structured JSON output with ai_query
Section titled “Get structured JSON output with ai_query”“Write a SQL query that uses ai_query to extract structured feedback analysis with topic, sentiment, and action items.”
SELECT ai_query( 'databricks-meta-llama-3-3-70b-instruct', concat('Summarize as JSON with keys: topic, sentiment, action_items. Feedback: ', feedback), returnType => 'STRUCT<topic STRING, sentiment STRING, action_items ARRAY<STRING>>') AS analysisFROM catalog.schema.customer_feedback LIMIT 50;returnType tells ai_query to parse the model response into a typed Spark struct. This gives you strongly typed columns you can filter, join, and aggregate downstream without manual JSON parsing.
Call external APIs with http_request
Section titled “Call external APIs with http_request”“Write a SQL query that validates orders against an external API using an HTTP connection.”
CREATE CONNECTION my_api_conn TYPE HTTP OPTIONS (host 'https://api.example.com', bearer_token secret('scope', 'token'));
SELECT order_id, http_request( conn => 'my_api_conn', method => 'POST', path => '/v1/validate', json => to_json(named_struct('order_id', order_id, 'amount', amount))).text AS api_responseFROM catalog.schema.orders WHERE needs_validation = true;http_request calls external services through Unity Catalog HTTP connections. The connection stores credentials securely via secret(). The function returns a STRUCT<status_code: INT, text: STRING> so you can parse responses with from_json.
Run federated queries against external databases
Section titled “Run federated queries against external databases”“Write a SQL query that reads customer data from a PostgreSQL database using Lakehouse Federation.”
SELECT * FROM remote_query( 'my_postgres_connection', database => 'my_database', query => 'SELECT customer_id, email FROM customers WHERE active = true');remote_query pushes SQL to an external database and returns results as a table. Supported databases include PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery, and Redshift. Queries are read-only and pushed down for efficient execution.
Ingest raw files from volumes
Section titled “Ingest raw files from volumes”“Write a SQL query that reads JSON event files from a Unity Catalog volume with schema hints.”
SELECT * FROM read_files( '/Volumes/catalog/schema/raw/events/', format => 'json', schemaHints => 'event_id STRING, timestamp TIMESTAMP, payload MAP<STRING, STRING>', pathGlobFilter => '*.json', recursiveFileLookup => true);read_files handles CSV, JSON, Parquet, Avro, ORC, XML, and binary formats. Use schemaHints to override specific inferred column types while letting the rest auto-detect.
Watch Out For
Section titled “Watch Out For”- Using AI functions on a Classic SQL warehouse — they require a Serverless SQL Warehouse. Classic warehouses return “function not found” errors.
- Forgetting to test
http_requestconnections — network connectivity, secret scopes, and OAuth configuration all need to work before the query runs. Test with a simple GET request first. - Running
remote_querywithout understanding pushdown — filters, limits, and aggregations are pushed to the remote database by default. But complex expressions may not push down, causing full table scans over the network. Check the query plan. - Skipping
LIMITduring AI function development — every row incurs Foundation Model API token costs. Start small, validate output, then scale to full tables.