Skip to content

Connection Patterns

Skill: databricks-lakebase-autoscale

You can connect applications and notebooks to Lakebase Autoscaling using standard PostgreSQL drivers. The key difference from a vanilla Postgres setup is authentication: Lakebase uses OAuth tokens that expire after one hour, so production apps must implement token refresh. This page covers the full spectrum — from a quick notebook connection to a production-grade SQLAlchemy pool with background refresh.

“Using Python, connect to Lakebase Autoscaling from a notebook using the Databricks SDK for token generation.”

import psycopg
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
endpoint = w.postgres.get_endpoint(
name="projects/my-app/branches/production/endpoints/ep-primary"
)
cred = w.postgres.generate_database_credential(
endpoint="projects/my-app/branches/production/endpoints/ep-primary"
)
conn_string = (
f"host={endpoint.status.hosts.host} "
f"dbname=databricks_postgres "
f"user={w.current_user.me().user_name} "
f"password={cred.token} sslmode=require"
)
with psycopg.connect(conn_string) as conn:
with conn.cursor() as cur:
cur.execute("SELECT version()")
print(cur.fetchone())

Key decisions:

  • OAuth tokens expire after 1 hour. For scripts and notebooks, generate a fresh token before each session. For long-running apps, implement a background refresh loop.
  • Native Postgres passwords (no expiry) are available for tools that cannot rotate tokens, but OAuth is preferred for security.
  • Always use sslmode=require — Lakebase rejects unencrypted connections.
  • All connections have a 24-hour idle timeout and 3-day maximum lifetime, regardless of auth method.

“Using Python, set up a SQLAlchemy async engine that refreshes OAuth tokens every 50 minutes.”

from sqlalchemy import event
from sqlalchemy.ext.asyncio import create_async_engine
engine = create_async_engine(
f"postgresql+psycopg://{username}@{host}:5432/{database}",
pool_size=5,
max_overflow=10,
pool_recycle=3600,
connect_args={"sslmode": "require"},
)
@event.listens_for(engine.sync_engine, "do_connect")
def inject_token(dialect, conn_rec, cargs, cparams):
cparams["password"] = current_token # Refreshed in background task

The do_connect event fires on every new physical connection. By injecting the token at connect time (rather than baking it into the URL), you ensure each connection gets the latest credential. Refresh tokens every 50 minutes — well before the 1-hour expiry.

“Using Python, resolve long Lakebase hostnames that fail with Python’s socket module on macOS.”

import subprocess
import socket
def resolve_hostname(hostname: str) -> str:
"""Use dig as fallback when Python's resolver fails on long hostnames."""
try:
return socket.gethostbyname(hostname)
except socket.gaierror:
pass
result = subprocess.run(
["dig", "+short", hostname],
capture_output=True, text=True, timeout=5,
)
ips = result.stdout.strip().split("\n")
for ip in ips:
if ip and not ip.startswith(";"):
return ip
raise RuntimeError(f"Could not resolve hostname: {hostname}")
# Pass both host (for TLS SNI) and hostaddr (resolved IP) to psycopg
conn_params = {
"host": hostname,
"hostaddr": resolve_hostname(hostname),
"sslmode": "require",
}

Python’s socket.getaddrinfo() truncates hostnames longer than 255 bytes on macOS, causing DNS resolution failures. The dig fallback resolves the IP directly, and passing hostaddr alongside host lets psycopg use the IP for the connection while preserving the hostname for TLS Server Name Indication.

Choosing between OAuth and native passwords

Section titled “Choosing between OAuth and native passwords”

“When should I use OAuth tokens versus native Postgres passwords?”

# OAuth tokens (preferred)
# - 1-hour expiry, must refresh
# - Best for: interactive sessions, workspace-integrated apps
# - SDK: w.postgres.generate_database_credential(endpoint=...)
# Native Postgres passwords (alternative)
# - No expiry
# - Best for: long-running processes, external tools without token rotation
# - Created via Lakebase UI or SDK role management

Use OAuth for anything running inside the Databricks ecosystem. Use native passwords only when the client application cannot implement token rotation (e.g., third-party BI tools with static connection strings).

  • Letting tokens expire during long queries — a 1-hour token can expire mid-transaction if your workload runs long. Implement a background refresh task that generates a new token every 50 minutes, and use SQLAlchemy’s do_connect event to inject it.
  • Skipping sslmode=require — Lakebase requires SSL. Connections without sslmode=require are rejected, producing a confusing “connection refused” error that looks like a network issue.
  • Creating new connections per request — without connection pooling, each HTTP request opens and closes a Postgres connection. Use SQLAlchemy’s pool or psycopg’s ConnectionPool to reuse connections.
  • Ignoring the 24-hour idle timeout — connections idle for 24 hours are closed by the server. Your application should handle reconnection gracefully, either through pool recycling (pool_recycle=3600) or explicit retry logic.