Skip to content

Triggers and Schedules

Skill: databricks-jobs

Every production pipeline needs an execution cadence. You can configure Databricks jobs to run on cron schedules, at fixed intervals, or in response to events like new files landing in cloud storage or Delta tables being updated. Each trigger type solves a different timing problem — scheduled ETL, periodic syncs, event-driven ingestion, or always-on streaming.

“Create a DABs job that runs daily at 6 AM Eastern, with a cron schedule that auto-unpauses in production but stays paused in dev.”

resources:
jobs:
daily_etl:
name: "[${bundle.target}] Daily ETL Pipeline"
schedule:
quartz_cron_expression: "0 0 6 * * ?"
timezone_id: "America/New_York"
pause_status: ${if(bundle.target == "prod", "UNPAUSED", "PAUSED")}
tasks:
- task_key: etl
notebook_task:
notebook_path: ../src/etl.py

Key decisions:

  • Quartz cron format is seconds minutes hours day-of-month month day-of-week — the leading seconds field is what trips people up compared to standard Unix cron
  • timezone_id uses IANA time zones (e.g., America/New_York), so your schedule respects daylight saving time transitions automatically
  • The ${if(...)} expression keeps dev jobs paused by default, preventing accidental scheduled runs against dev data
  • pause_status: UNPAUSED activates the schedule immediately on deployment — no manual step needed

“Set up a job that runs every 4 hours using a periodic trigger instead of cron, in Python.”

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import (
TriggerSettings, Periodic,
PeriodicTriggerConfigurationTimeUnit, PauseStatus,
)
w = WorkspaceClient()
job = w.jobs.create(
name="four-hour-sync",
trigger=TriggerSettings(
pause_status=PauseStatus.UNPAUSED,
periodic=Periodic(
interval=4,
unit=PeriodicTriggerConfigurationTimeUnit.HOURS,
),
),
tasks=[...],
)

Periodic triggers are simpler than cron when you just need “every N hours/days/weeks.” The minimum interval is 1 hour — for anything more frequent, use cron.

“Create a job that processes new uploads whenever files land in an S3 bucket, with a 5-minute cooldown between runs.”

resources:
jobs:
process_uploads:
name: "[${bundle.target}] Process Uploads"
trigger:
pause_status: UNPAUSED
file_arrival:
url: "s3://data-lake/incoming/orders/"
min_time_between_triggers_seconds: 300
wait_after_last_change_seconds: 60
tasks:
- task_key: process
notebook_task:
notebook_path: ../src/process_files.py

min_time_between_triggers_seconds prevents rapid-fire triggering when many files arrive in bursts. wait_after_last_change_seconds adds a settling period — the job waits 60 seconds after the last file change before triggering, so you process complete batches rather than partial uploads.

“Trigger a downstream job automatically when two source tables in Unity Catalog are updated.”

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import (
TriggerSettings, TableUpdateTriggerConfiguration,
Condition, PauseStatus,
)
w = WorkspaceClient()
job = w.jobs.create(
name="table-change-handler",
trigger=TriggerSettings(
pause_status=PauseStatus.UNPAUSED,
table_update=TableUpdateTriggerConfiguration(
table_names=[
"main.bronze.raw_orders",
"main.bronze.raw_inventory",
],
condition=Condition.ANY_UPDATED,
min_time_between_triggers_seconds=600,
wait_after_last_change_seconds=120,
),
),
tasks=[...],
)

Table update triggers watch Unity Catalog Delta tables for changes. ANY_UPDATED fires when any monitored table changes — use this when your transform depends on at least one source being fresh. The job identity needs SELECT permission on the monitored tables.

“Configure a streaming processor that runs continuously and auto-restarts after failures.”

resources:
jobs:
streaming_processor:
name: "[${bundle.target}] Streaming Processor"
continuous:
pause_status: UNPAUSED
tasks:
- task_key: stream
notebook_task:
notebook_path: ../src/streaming_processor.py

Continuous jobs start immediately and restart automatically after completion or failure. They maintain exactly one active run at a time. Pause with pause_status: PAUSED to stop the loop. This is the right choice for Structured Streaming workloads that need sub-second latency.

  • Using Unix cron format instead of Quartz — Quartz cron has six fields (starting with seconds), not five. 0 6 * * * is valid Unix cron but invalid here. The correct equivalent is 0 0 6 * * ?.
  • Forgetting the ? in day-of-week — Quartz cron requires either day-of-month or day-of-week to be ? (unspecified). Using * in both positions causes a validation error.
  • Setting min_time_between_triggers_seconds too low on file arrival — A zero-second cooldown means every single file triggers a new run. For bursty uploads, set this to at least 60-300 seconds to batch files into a single run.
  • Skipping wait_after_last_change_seconds — Without a settling period, the job triggers mid-upload when a multi-file batch is still arriving. The result: partial processing and a confusing incomplete dataset.
  • Running continuous jobs for batch workloads — Continuous mode keeps your cluster running 24/7. If your data arrives on a schedule, use a cron trigger instead and save the idle compute cost.