System Tables
Skill: databricks-unity-catalog
What You Can Build
Section titled “What You Can Build”System tables are read-only tables in the system catalog that expose operational metadata across your Databricks account — audit logs, billing records, lineage graphs, compute usage, query history. Instead of building custom logging infrastructure, you query these tables with standard SQL to answer questions like “who accessed this table last week,” “which jobs are failing,” and “where is our DBU spend going.”
In Action
Section titled “In Action”“Using SQL, find all users who accessed a specific sensitive table in the last 7 days, including their IP addresses.”
SELECT event_time, user_identity.email AS user_email, source_ip_address, action_name, response.status_codeFROM system.access.auditWHERE event_date >= current_date() - 7 AND request_params.full_name_arg = 'analytics.gold.customer_pii' AND action_name IN ('getTable', 'commandSubmit')ORDER BY event_time DESC;Key decisions:
- Always filter on
event_date(the partition key) — filtering onevent_timealone scans all partitions and runs slowly - System tables must be explicitly enabled per schema by an account admin or metastore admin
- Audit log retention is 365 days; query history is 30 days — plan aggregation jobs if you need longer retention
- Grant access per schema:
system.accessfor audit,system.billingfor cost,system.queryfor performance
More Patterns
Section titled “More Patterns”Monthly billing breakdown by compute type
Section titled “Monthly billing breakdown by compute type”“Using SQL, break down last month’s DBU consumption by compute type with estimated dollar costs.”
SELECT CASE WHEN sku_name LIKE '%ALL_PURPOSE%' THEN 'All-Purpose Compute' WHEN sku_name LIKE '%JOBS%' THEN 'Jobs Compute' WHEN sku_name LIKE '%SQL%' THEN 'SQL Warehouse' WHEN sku_name LIKE '%SERVERLESS%' THEN 'Serverless' ELSE 'Other' END AS compute_type, SUM(u.usage_quantity) AS total_dbus, SUM(u.usage_quantity * p.pricing.default) AS estimated_costFROM system.billing.usage uLEFT JOIN system.billing.list_prices p ON u.sku_name = p.sku_name AND u.cloud = p.cloudWHERE u.usage_date >= current_date() - 30 AND p.price_end_time IS NULLGROUP BY 1ORDER BY estimated_cost DESC;The list_prices join gives you estimated dollar costs. Filter price_end_time IS NULL to get current prices only — historical prices have an end date.
Trace upstream lineage for a gold table
Section titled “Trace upstream lineage for a gold table”“Using SQL, find all upstream tables that feed into a gold-layer table, recursively.”
WITH RECURSIVE lineage AS ( SELECT source_table_full_name, target_table_full_name, 1 AS depth FROM system.access.table_lineage WHERE target_table_full_name = 'analytics.gold.customer_360'
UNION ALL
SELECT t.source_table_full_name, t.target_table_full_name, l.depth + 1 FROM system.access.table_lineage t JOIN lineage l ON t.target_table_full_name = l.source_table_full_name WHERE l.depth < 10)SELECT DISTINCT source_table_full_name, target_table_full_name, depthFROM lineageORDER BY depth;Recursive CTEs walk the lineage graph from gold back to bronze. The depth < 10 guard prevents infinite loops from circular references.
Job failure monitoring
Section titled “Job failure monitoring”“Using SQL, find failing jobs in the last 24 hours with success rate trends over the past month.”
-- Failed jobs in last 24 hoursSELECT job_id, run_id, period_start_time, result_state, termination_codeFROM system.lakeflow.job_run_timelineWHERE period_start_time >= current_timestamp() - INTERVAL 24 HOURS AND result_state IN ('FAILED', 'TIMEDOUT', 'CANCELED')ORDER BY period_start_time DESC;
-- Job success rates over 30 daysSELECT job_id, COUNT(*) AS total_runs, ROUND(100.0 * SUM(CASE WHEN result_state = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*), 2) AS success_rate_pctFROM system.lakeflow.job_run_timelineWHERE period_start_time >= current_date() - 30GROUP BY job_idHAVING COUNT(*) >= 5ORDER BY success_rate_pct ASC;The lakeflow schema covers both jobs and SDP pipeline events. Filter HAVING COUNT(*) >= 5 to avoid flagging jobs with too few runs to be statistically meaningful.
Enable system schemas with the Python SDK
Section titled “Enable system schemas with the Python SDK”“Using Python, list available system schemas and enable the ones you need.”
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# List system schemas and their statefor schema in w.system_schemas.list(metastore_id="your-metastore-id"): print(f"{schema.schema}: {schema.state}")
# Enable access and billing schemasw.system_schemas.enable(metastore_id="your-metastore-id", schema_name="access")w.system_schemas.enable(metastore_id="your-metastore-id", schema_name="billing")System schemas must be enabled before they’re queryable. This is a one-time setup step per metastore.
Watch Out For
Section titled “Watch Out For”- Always filter on the partition key —
event_datefor audit,usage_datefor billing,period_start_timefor jobs. Filtering only on timestamp columns without the date partition triggers full table scans on tables that can be hundreds of GB. - System tables must be explicitly enabled — they’re not on by default. An account admin or metastore admin needs to enable each schema (access, billing, compute, lakeflow, query) before anyone can query it.
- Retention limits vary — audit and billing data retain 365 days; query history and compute events retain 30 days. If compliance requires longer retention, schedule a job to copy system table data into your own Delta tables.
- Column lineage tracks PII flow — use
system.access.column_lineageto trace where sensitive columns likeemailorssnpropagate. This is critical for GDPR/CCPA compliance audits. - Create views for repeated queries — system table queries tend to be complex. Wrap common patterns in views (e.g.,
analytics.governance.daily_audit_summary) to avoid duplicating logic across dashboards and alerts.