SQL Alerts

Skill: databricks-bundles

What You Can Build

You can define SQL alerts as bundle resources that monitor query results and fire notifications when thresholds are breached. This turns your data quality checks into version-controlled, deployable infrastructure — same repo, same databricks bundle deploy, same promotion path from dev to prod.

In Action

“Create a bundle resource for a SQL alert that monitors data freshness. It should fire when a table hasn’t been updated in more than 24 hours and send notifications on both alert and recovery.”

resources:
  queries:
    freshness_check:
      display_name: "Data Freshness Check"
      query_text: |
        SELECT ROUND(
          (UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) -
           UNIX_TIMESTAMP(MAX(updated_at))) / 3600, 1
        ) AS hours_since_update
        FROM prod_catalog.bronze.raw_events
      warehouse_id: ${var.warehouse_id}

  alerts:
    data_freshness_alert:
      display_name: "Data Freshness Alert"
      query_id: ${resources.queries.freshness_check.id}
      condition:
        op: GREATER_THAN
        operand:
          column:
            name: hours_since_update
        threshold:
          value:
            double_value: 24.0
      custom_subject: "Data Freshness Alert: {{ALERT_STATUS}}"
      custom_body: |
        Hours since last update: {{QUERY_RESULT_VALUE}}
        Threshold: 24 hours
        Status: {{ALERT_STATUS}}
      notify_on_ok: true

Key decisions:

The alert references its query via ${resources.queries.freshness_check.id}, keeping both resources portable across environments
notify_on_ok: true sends a recovery notification when the alert clears — without it, you only hear about problems, never resolutions
GREATER_THAN with a double_value threshold works for numeric comparisons; the alert evaluates on each scheduled query execution
Custom subject and body use {{ALERT_STATUS}} and {{QUERY_RESULT_VALUE}} template variables to include runtime context in notifications

More Patterns

Validate the Alert Schema First

“Check the current alert schema before writing YAML to make sure my fields match the API surface.”

databricks bundle schema | jq '.properties.resources.properties.alerts'

The alert API surface evolves between CLI versions. Running this command before you write YAML prevents deployment failures from stale field names or missing required properties.

Row Count Anomaly Alert

“Create an alert that fires when a table’s row count drops below a minimum threshold, indicating a potential ingestion failure.”

resources:
  queries:
    row_count_check:
      display_name: "Orders Row Count Check"
      query_text: |
        SELECT COUNT(*) AS row_count
        FROM prod_catalog.silver.orders
        WHERE order_date = CURRENT_DATE()
      warehouse_id: ${var.warehouse_id}

  alerts:
    low_row_count:
      display_name: "Low Order Count Alert"
      query_id: ${resources.queries.row_count_check.id}
      condition:
        op: LESS_THAN
        operand:
          column:
            name: row_count
        threshold:
          value:
            double_value: 1000.0
      custom_subject: "Low Row Count: {{ALERT_STATUS}}"
      custom_body: "Today's order count: {{QUERY_RESULT_VALUE}} (expected >= 1000)"
      notify_on_ok: true

This catches silent ingestion failures — the pipeline runs successfully but processes zero or very few records. Set the threshold based on your historical minimum daily volume.

Environment-Specific Alert Thresholds

“Configure different freshness thresholds for dev and prod so dev alerts don’t wake anyone up.”

variables:
  freshness_threshold:
    default: "48.0"

targets:
  dev:
    variables:
      freshness_threshold: "48.0"
  prod:
    variables:
      freshness_threshold: "24.0"

resources:
  alerts:
    data_freshness_alert:
      display_name: "[${bundle.target}] Data Freshness"
      query_id: ${resources.queries.freshness_check.id}
      condition:
        op: GREATER_THAN
        operand:
          column:
            name: hours_since_update
        threshold:
          value:
            double_value: ${var.freshness_threshold}

Dev environments get a relaxed 48-hour threshold so test data staleness doesn’t generate noise. Prod gets the tight 24-hour window that matters.

Watch Out For

Skipping schema validation — The alert resource schema changes between CLI versions. Run databricks bundle schema | jq before writing YAML to avoid cryptic deployment errors from renamed or removed fields.
Hardcoding query_id instead of using resource references — A hardcoded query ID breaks when you deploy to a different workspace. Use ${resources.queries.name.id} to keep alerts portable.
Forgetting notify_on_ok — Without recovery notifications, you get paged when things break but never learn when they heal. Your on-call team ends up manually checking resolved alerts.
Setting thresholds too tight in dev — Dev data is often stale or incomplete. Use bundle variables to set environment-appropriate thresholds, or your dev alerts become background noise that everyone ignores.