Triggers and Schedules
Skill: databricks-jobs
What You Can Build
Section titled “What You Can Build”Every production pipeline needs an execution cadence. You can configure Databricks jobs to run on cron schedules, at fixed intervals, or in response to events like new files landing in cloud storage or Delta tables being updated. Each trigger type solves a different timing problem — scheduled ETL, periodic syncs, event-driven ingestion, or always-on streaming.
In Action
Section titled “In Action”“Create a DABs job that runs daily at 6 AM Eastern, with a cron schedule that auto-unpauses in production but stays paused in dev.”
resources: jobs: daily_etl: name: "[${bundle.target}] Daily ETL Pipeline" schedule: quartz_cron_expression: "0 0 6 * * ?" timezone_id: "America/New_York" pause_status: ${if(bundle.target == "prod", "UNPAUSED", "PAUSED")} tasks: - task_key: etl notebook_task: notebook_path: ../src/etl.pyKey decisions:
- Quartz cron format is
seconds minutes hours day-of-month month day-of-week— the leading seconds field is what trips people up compared to standard Unix cron timezone_iduses IANA time zones (e.g.,America/New_York), so your schedule respects daylight saving time transitions automatically- The
${if(...)}expression keeps dev jobs paused by default, preventing accidental scheduled runs against dev data pause_status: UNPAUSEDactivates the schedule immediately on deployment — no manual step needed
More Patterns
Section titled “More Patterns”Periodic Trigger
Section titled “Periodic Trigger”“Set up a job that runs every 4 hours using a periodic trigger instead of cron, in Python.”
from databricks.sdk import WorkspaceClientfrom databricks.sdk.service.jobs import ( TriggerSettings, Periodic, PeriodicTriggerConfigurationTimeUnit, PauseStatus,)
w = WorkspaceClient()
job = w.jobs.create( name="four-hour-sync", trigger=TriggerSettings( pause_status=PauseStatus.UNPAUSED, periodic=Periodic( interval=4, unit=PeriodicTriggerConfigurationTimeUnit.HOURS, ), ), tasks=[...],)Periodic triggers are simpler than cron when you just need “every N hours/days/weeks.” The minimum interval is 1 hour — for anything more frequent, use cron.
File Arrival Trigger
Section titled “File Arrival Trigger”“Create a job that processes new uploads whenever files land in an S3 bucket, with a 5-minute cooldown between runs.”
resources: jobs: process_uploads: name: "[${bundle.target}] Process Uploads" trigger: pause_status: UNPAUSED file_arrival: url: "s3://data-lake/incoming/orders/" min_time_between_triggers_seconds: 300 wait_after_last_change_seconds: 60 tasks: - task_key: process notebook_task: notebook_path: ../src/process_files.pymin_time_between_triggers_seconds prevents rapid-fire triggering when many files arrive in bursts. wait_after_last_change_seconds adds a settling period — the job waits 60 seconds after the last file change before triggering, so you process complete batches rather than partial uploads.
Table Update Trigger
Section titled “Table Update Trigger”“Trigger a downstream job automatically when two source tables in Unity Catalog are updated.”
from databricks.sdk import WorkspaceClientfrom databricks.sdk.service.jobs import ( TriggerSettings, TableUpdateTriggerConfiguration, Condition, PauseStatus,)
w = WorkspaceClient()
job = w.jobs.create( name="table-change-handler", trigger=TriggerSettings( pause_status=PauseStatus.UNPAUSED, table_update=TableUpdateTriggerConfiguration( table_names=[ "main.bronze.raw_orders", "main.bronze.raw_inventory", ], condition=Condition.ANY_UPDATED, min_time_between_triggers_seconds=600, wait_after_last_change_seconds=120, ), ), tasks=[...],)Table update triggers watch Unity Catalog Delta tables for changes. ANY_UPDATED fires when any monitored table changes — use this when your transform depends on at least one source being fresh. The job identity needs SELECT permission on the monitored tables.
Continuous Job (Always-On)
Section titled “Continuous Job (Always-On)”“Configure a streaming processor that runs continuously and auto-restarts after failures.”
resources: jobs: streaming_processor: name: "[${bundle.target}] Streaming Processor" continuous: pause_status: UNPAUSED tasks: - task_key: stream notebook_task: notebook_path: ../src/streaming_processor.pyContinuous jobs start immediately and restart automatically after completion or failure. They maintain exactly one active run at a time. Pause with pause_status: PAUSED to stop the loop. This is the right choice for Structured Streaming workloads that need sub-second latency.
Watch Out For
Section titled “Watch Out For”- Using Unix cron format instead of Quartz — Quartz cron has six fields (starting with seconds), not five.
0 6 * * *is valid Unix cron but invalid here. The correct equivalent is0 0 6 * * ?. - Forgetting the
?in day-of-week — Quartz cron requires either day-of-month or day-of-week to be?(unspecified). Using*in both positions causes a validation error. - Setting
min_time_between_triggers_secondstoo low on file arrival — A zero-second cooldown means every single file triggers a new run. For bursty uploads, set this to at least 60-300 seconds to batch files into a single run. - Skipping
wait_after_last_change_seconds— Without a settling period, the job triggers mid-upload when a multi-file batch is still arriving. The result: partial processing and a confusing incomplete dataset. - Running continuous jobs for batch workloads — Continuous mode keeps your cluster running 24/7. If your data arrives on a schedule, use a cron trigger instead and save the idle compute cost.