ai_forecast
Skill: databricks-ai-functions
What You Can Build
Section titled “What You Can Build”ai_forecast is a table-valued function that projects time series data into the future with confidence intervals. You pass it a table of historical observations and a horizon date, and it returns point forecasts plus upper and lower bounds. It handles single metrics, multiple groups (forecast per region), and multiple metrics (revenue and units in one call). Requires a Pro or Serverless SQL warehouse.
In Action
Section titled “In Action”“Write a SQL query that forecasts daily revenue through end of year based on historical order data.”
SELECT * FROM ai_forecast( observed => TABLE(SELECT order_date, revenue FROM catalog.schema.daily_revenue), horizon => '2026-12-31', time_col => 'order_date', value_col => 'revenue');-- Returns: order_date, revenue_forecast, revenue_upper, revenue_lowerKey decisions:
observedtakes a TABLE expression containing your historical data — filter it to relevant rows before passing it inhorizonis the right-exclusive end date for the forecast window- Output columns follow the pattern
{value_col}_forecast,{value_col}_upper,{value_col}_lower - The underlying model is prophet-like (piecewise linear with seasonality detection)
- The time column type is preserved in output — DATE stays DATE, TIMESTAMP stays TIMESTAMP
More Patterns
Section titled “More Patterns”Forecast by group
Section titled “Forecast by group”“Write a SQL query that produces independent sales forecasts for each region.”
SELECT * FROM ai_forecast( observed => TABLE(SELECT date, region, sales FROM catalog.schema.regional_sales), horizon => '2026-12-31', time_col => 'date', value_col => 'sales', group_col => 'region');Each distinct value of group_col gets its own independent forecast model. The output includes the group column so you can filter or join results by region downstream.
Forecast multiple metrics simultaneously
Section titled “Forecast multiple metrics simultaneously”“Write a SQL query that forecasts both units sold and revenue in a single call.”
SELECT * FROM ai_forecast( observed => TABLE(SELECT date, units, revenue FROM catalog.schema.daily_kpis), horizon => '2026-06-30', time_col => 'date', value_col => 'units,revenue');Pass a comma-separated list to value_col to forecast up to 100 metrics per group in one call. The output contains units_forecast, units_upper, units_lower, revenue_forecast, revenue_upper, revenue_lower.
Narrow the confidence interval for anomaly detection
Section titled “Narrow the confidence interval for anomaly detection”“Write a SQL query that forecasts sensor readings with an 80% prediction interval for tighter anomaly bounds.”
SELECT * FROM ai_forecast( observed => TABLE(SELECT ts, sensor_value FROM catalog.schema.iot_readings), horizon => '2026-03-31', time_col => 'ts', value_col => 'sensor_value', prediction_interval_width => 0.80);The default prediction interval is 0.95. Narrowing it to 0.80 produces tighter upper/lower bounds, which is useful when you want to flag values outside the forecast range as anomalies.
Save forecasts to a Delta table
Section titled “Save forecasts to a Delta table”“Write PySpark code that runs a forecast and persists the results as a Delta table for downstream dashboards.”
result = spark.sql(""" SELECT * FROM ai_forecast( observed => TABLE(SELECT date, sales FROM catalog.schema.daily_sales), horizon => '2026-12-31', time_col => 'date', value_col => 'sales' )""")result.write.format("delta").mode("overwrite").saveAsTable("catalog.schema.sales_forecast")Materializing forecast output to Delta lets dashboards and downstream queries read precomputed predictions without re-running the model on every access.
Watch Out For
Section titled “Watch Out For”- Using
'M'for monthly frequency with DATE columns — DATE columns needfrequency => 'month', not'M'. The'M'shorthand is for TIMESTAMP columns only. Mismatching this produces incorrect intervals. - Passing unfiltered data as the observed table — the model trains on whatever you give it. Outliers, nulls, and gaps in the time series degrade forecast quality. Clean and filter the input TABLE expression before calling
ai_forecast. - Expecting causal explanations —
ai_forecastfits a statistical trend model. It doesn’t know about promotions, holidays, or business events. For those, you need to add regressors or post-process the output. - Forecasting high-cardinality groups without limits — forecasting 10,000 ZIP codes in one call is expensive. Aggregate to a coarser level or limit groups to the top N by volume.