Skip to content

Computes

Skill: databricks-lakebase-autoscale

You can size your Lakebase database compute from 0.5 CU (1 GB RAM) to 112 CU (224 GB RAM), enable autoscaling within a range, and configure scale-to-zero to suspend idle databases automatically. Each branch gets one primary read-write compute and optional read replicas for distributing query load.

“Using Python, resize a Lakebase compute to autoscale between 2 and 8 CU.”

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Endpoint, EndpointSpec, FieldMask
w = WorkspaceClient()
w.postgres.update_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
endpoint=Endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary",
spec=EndpointSpec(
autoscaling_limit_min_cu=2.0,
autoscaling_limit_max_cu=8.0,
),
),
update_mask=FieldMask(
field_mask=["spec.autoscaling_limit_min_cu", "spec.autoscaling_limit_max_cu"]
),
).wait()

Key decisions:

  • Each Compute Unit (CU) allocates approximately 2 GB of RAM — different from Lakebase Provisioned, which uses 16 GB per CU
  • Autoscaling range: 0.5-32 CU, with the constraint that max minus min cannot exceed 8 CU (e.g., 4-8 CU is valid, 0.5-32 CU is not)
  • Large fixed-size computes (36-112 CU) do not autoscale
  • All update operations require an update_mask — the API rejects requests without it
  • Resource names follow hierarchical paths: projects/{id}/branches/{id}/endpoints/{id}

“How much RAM and how many connections do I get at each size?”

# Representative sizing -- each CU provides ~2 GB RAM
#
# 0.5 CU -> ~1 GB RAM, 104 max connections
# 4 CU -> ~8 GB RAM, 839 max connections
# 8 CU -> ~16 GB RAM, 1,678 max connections
# 32 CU -> ~64 GB RAM, 4,000 max connections
# 112 CU -> ~224 GB RAM, 4,000 max connections
#
# Connection limits are based on the maximum CU in the autoscaling range.
# Set minimum CU large enough to cache your working set in memory.

Performance may degrade after a scale-up event until the compute re-caches data in memory. If your workload is latency-sensitive, set a higher minimum to avoid cold-cache periods.

“Using Python, check the current scale-to-zero configuration for a compute.”

endpoint = w.postgres.get_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary"
)
print(f"State: {endpoint.status.current_state}")
print(f"Min CU: {endpoint.status.autoscaling_limit_min_cu}")
print(f"Max CU: {endpoint.status.autoscaling_limit_max_cu}")

Scale-to-zero suspends the compute after a configurable inactivity timeout (default: 5 minutes, minimum: 60 seconds). The production branch defaults to always-active (scale-to-zero disabled). Other branches can be configured to suspend.

Handling scale-to-zero wake-up in application code

Section titled “Handling scale-to-zero wake-up in application code”

“Using Python, implement retry logic for the brief reactivation period after scale-to-zero.”

import psycopg
import time
def connect_with_retry(conn_string, max_retries=3, delay=2):
"""Handle scale-to-zero wake-up with exponential backoff."""
for attempt in range(max_retries):
try:
return psycopg.connect(conn_string)
except psycopg.OperationalError:
if attempt < max_retries - 1:
time.sleep(delay * (2 ** attempt))
else:
raise

When a connection arrives on a suspended compute, it starts automatically in a few hundred milliseconds. The first connection attempt may fail during reactivation. Session context (temp tables, prepared statements, statistics) resets after reactivation.

  • Setting an autoscaling range wider than 8 CU — the platform enforces a max-minus-min constraint of 8 CU. A range like 0.5-32 CU is rejected. Use 4-8, 8-16, or 16-24.
  • Expecting session state to survive scale-to-zero — when a compute suspends and reactivates, temporary tables, prepared statements, cached statistics, and active transactions are all lost. Design your app to recreate session state on reconnect, or disable scale-to-zero for stateful workloads.
  • Forgetting retry logic for suspended computes — the first connection after an idle period may fail during the brief reactivation window. Without retry logic, your app surfaces a connection error to the user instead of waiting a few hundred milliseconds.
  • Sizing too small for your working set — autoscaling handles demand spikes, but the minimum CU determines baseline cache capacity. If your minimum is too low, every scale event triggers a cold-cache penalty.