Lakebase (PostgreSQL) Connectivity
Skill: databricks-app-python
What You Can Build
Section titled “What You Can Build”Databricks Apps can connect to Lakebase for low-latency transactional storage — user sessions, application state, CRUD operations — all backed by a managed PostgreSQL interface. The runtime injects connection credentials as environment variables, so your app connects the same way it would to any Postgres database. No secret management, no token rotation.
In Action
Section titled “In Action”“Using Python and psycopg2, build a Databricks App that connects to Lakebase using the injected environment variables and queries a users table.”
import osimport psycopg2
conn = psycopg2.connect( host=os.getenv("PGHOST"), database=os.getenv("PGDATABASE"), user=os.getenv("PGUSER"), password=os.getenv("PGPASSWORD"), port=os.getenv("PGPORT", "5432"),)
with conn.cursor() as cur: cur.execute("SELECT user_id, email, created_at FROM users ORDER BY created_at DESC LIMIT 10") rows = cur.fetchall()
conn.close()Key decisions:
- The runtime injects
PGHOST,PGDATABASE,PGUSER,PGPASSWORD, andPGPORTautomatically when your app has a Lakebase resource configured — no manual secret setup - Wire the database resource in
app.yamlso the runtime knows to inject credentials at deploy time - Use
psycopg2-binaryfor synchronous apps (Streamlit, Dash, Flask) andasyncpgfor async frameworks (FastAPI) - Always close connections explicitly or use context managers — the Lakebase connection pool has a finite limit
app.yaml resource binding
Section titled “app.yaml resource binding”env: - name: DB_CONNECTION_STRING valueFrom: resource: databaseThis tells the Databricks Apps runtime to provision a Lakebase database and inject the PG environment variables into your app’s process. Without this, the PG* environment variables will be empty.
More Patterns
Section titled “More Patterns”Async connections with FastAPI
Section titled “Async connections with FastAPI”“Using Python and asyncpg, connect a FastAPI app to Lakebase for non-blocking database queries.”
import osimport asyncpgfrom fastapi import FastAPI
app = FastAPI()
async def get_connection(): return await asyncpg.connect( host=os.getenv("PGHOST"), database=os.getenv("PGDATABASE"), user=os.getenv("PGUSER"), password=os.getenv("PGPASSWORD"), port=int(os.getenv("PGPORT", "5432")), )
@app.get("/api/orders")async def list_orders(): conn = await get_connection() rows = await conn.fetch("SELECT * FROM orders ORDER BY created_at DESC LIMIT 50") await conn.close() return [dict(row) for row in rows]FastAPI is async by default, so asyncpg avoids blocking the event loop during database calls. For production, replace the per-request connection with an asyncpg.create_pool() to reuse connections across requests.
SQLAlchemy engine for ORM-based apps
Section titled “SQLAlchemy engine for ORM-based apps”“Using Python and SQLAlchemy, set up an engine pointed at Lakebase for a Databricks App with models.”
import osfrom sqlalchemy import create_engine
DATABASE_URL = ( f"postgresql://{os.getenv('PGUSER')}:{os.getenv('PGPASSWORD')}" f"@{os.getenv('PGHOST')}:{os.getenv('PGPORT', '5432')}" f"/{os.getenv('PGDATABASE')}")
engine = create_engine(DATABASE_URL, pool_size=5, pool_recycle=3600)SQLAlchemy gives you an ORM layer and connection pooling out of the box. The pool_recycle=3600 setting prevents stale connections from accumulating during idle periods.
Cached connection in Streamlit
Section titled “Cached connection in Streamlit”“Using Python and Streamlit, cache the Lakebase connection so it survives reruns.”
import osimport streamlit as stimport psycopg2
@st.cache_resourcedef get_db_connection(): return psycopg2.connect( host=os.getenv("PGHOST"), database=os.getenv("PGDATABASE"), user=os.getenv("PGUSER"), password=os.getenv("PGPASSWORD"), )
conn = get_db_connection()with conn.cursor() as cur: cur.execute("SELECT count(*) FROM events") st.metric("Total Events", cur.fetchone()[0])Without @st.cache_resource, Streamlit creates a new database connection on every user interaction. That exhausts the connection pool within minutes on a busy app. The cached connection persists across reruns and is shared by all users.
Watch Out For
Section titled “Watch Out For”- Missing the
app.yamlresource binding — without thevalueFrom: resource: databaseentry, the runtime does not injectPG*environment variables. Your app starts with empty connection strings and fails on the first query. - Using psycopg2 with FastAPI — psycopg2 is synchronous and blocks the event loop. Use
asyncpgorpsycopg[async]for async frameworks. Reserve psycopg2 for synchronous apps like Streamlit and Dash. - Forgetting
@st.cache_resourcein Streamlit — Streamlit reruns the entire script on every interaction. Without caching, each click opens a new database connection and never closes the old one. - Adding psycopg2 or asyncpg to
requirements.txtwithout the binary suffix — usepsycopg2-binaryto avoid needing a C compiler in the runtime. Plainpsycopg2requireslibpq-devheaders which are not available.