Execution & Compute
Skill: databricks-execution-compute
What You Can Build
Section titled “What You Can Build”You can execute code on Databricks directly from your local editor — no browser, no notebook UI. Run Python on serverless with zero cluster management, iterate interactively on a running cluster with preserved state, or push a local file to execute remotely. Your AI coding assistant picks the right compute automatically and manages the full lifecycle: execute, capture output, optionally persist as a workspace notebook.
In Action
Section titled “In Action”“Run this ETL script on serverless compute and save it as a notebook in my workspace for future scheduling.”
# Your AI coding assistant calls execute_code under the hoodexecute_code( code="""import pyspark.sql.functions as F
df = ( spark.read.table("catalog.schema.raw_events") .filter(F.col("event_date") >= "2025-01-01") .groupBy("user_id", "event_type") .agg(F.count("*").alias("event_count")))
df.write.mode("overwrite").saveAsTable("catalog.schema.user_event_summary")dbutils.notebook.exit(f"Wrote {df.count()} rows")""", compute_type="serverless", workspace_path="/Workspace/Users/user@example.com/etl/event_summary", run_name="event-summary-v1",)Key decisions:
compute_type="serverless"— no cluster to provision or wait for. Serverless spins up dedicated compute in 25-50 seconds and tears it down after execution. Best for Python and SQL workloads that do not need persistent state.workspace_pathfor persistence — saves the code as a notebook in the workspace. Without it, the execution is ephemeral — results are returned but nothing is saved. Use persistence when you want to schedule the notebook as a job later.dbutils.notebook.exit()for output — on serverless,print()output is unreliable. Always usedbutils.notebook.exit()to return a result string your assistant can display.run_namefor traceability — names the serverless run so you can find it in the Jobs UI later. Without it, runs get auto-generated names that are hard to identify.
More Patterns
Section titled “More Patterns”Interactive cluster session with state
Section titled “Interactive cluster session with state”“Set up an interactive session on my cluster. Run some setup code, then query the results in a follow-up.”
# First call -- creates an execution contextresult = execute_code( code="""import pandas as pddf = pd.DataFrame({ "region": ["US", "EU", "APAC", "US", "EU"], "revenue": [1200, 950, 800, 1100, 1050],})spark_df = spark.createDataFrame(df)spark_df.createOrReplaceTempView("regional_revenue")print("View created")""", compute_type="cluster",)
# Second call -- reuses the same context, variables persistexecute_code( code="spark.sql('SELECT region, SUM(revenue) FROM regional_revenue GROUP BY region').show()", context_id=result["context_id"], cluster_id=result["cluster_id"],)The context_id preserves variables, temp views, and imports between calls. This is the cluster equivalent of running cells in a notebook. Drop the context when done by passing destroy_context_on_completion=True on the last call.
Run a local file on a cluster
Section titled “Run a local file on a cluster”“Execute my local transform.py file on the dev cluster.”
execute_code( file_path="/Users/me/project/src/transform.py", compute_type="cluster",)The tool detects the language from the file extension (.py, .scala, .sql, .r) and uploads it for execution. This is the fastest path from local development to remote testing — no manual upload, no workspace notebook creation.
Create and manage clusters
Section titled “Create and manage clusters”“Spin up a 4-worker autoscaling cluster with ML Runtime for model training.”
# Create an autoscaling clustermanage_cluster( action="create", name="ml-training-cluster", spark_version="15.4.x-ml-scala2.12", node_type_id="i3.xlarge", autoscale_min_workers=2, autoscale_max_workers=8, autotermination_minutes=60,)
# Later: terminate when done (does not delete)manage_cluster(action="terminate", cluster_id="0123-456789-abcdef")
# Check status while it stopslist_compute(resource="clusters", cluster_id="0123-456789-abcdef")terminate stops the cluster but preserves its configuration for restarting later. delete is permanent and irreversible. Always confirm before deleting.
Watch Out For
Section titled “Watch Out For”print()on serverless is unreliable — serverless compute does not guarantee stdout capture. Usedbutils.notebook.exit("result string")to return data from serverless runs. This catches everyone at least once.- Scala and R require cluster compute — serverless only supports Python and SQL. If you pass
language="scala"withcompute_type="serverless", the tool will error. Theautocompute mode handles this by falling back to cluster, but only if a running cluster exists. - No cluster available, no clear error path — if you request cluster execution and no cluster is running, the tool returns
startable_clustersin the error response. Either start one (3-8 minute wait) or switch tocompute_type="serverless"for Python workloads. - Context leaks on clusters — execution contexts consume memory on the cluster. If you create many contexts without destroying them, the cluster can run out of memory. Pass
destroy_context_on_completion=Truewhen you are done iterating.