Skip to content

Views

Skill: databricks-spark-declarative-pipelines

Views in Spark Declarative Pipelines come in two forms. Temporary views exist only during pipeline execution — use them for intermediate filtering, shared logic, or preprocessing before Auto CDC. Persistent views get published to Unity Catalog so dashboards and external queries can reference them directly.

“Create a Python temporary view that filters test events out of the raw stream before downstream tables consume it”

@dp.temporary_view()
def filtered_events():
return (
spark.readStream.table("raw_events")
.filter("event_type != 'test'")
)

Key decisions:

  • Temporary view over a streaming table — this intermediate filter doesn’t need to be persisted or queried outside the pipeline. A temporary view avoids storage costs and keeps the catalog clean.
  • Streaming read inside a view — temporary views can wrap either batch or streaming reads. Downstream tables inherit the read mode, so a streaming table reading from this view still gets incremental processing.

Publish a query to Unity Catalog for external consumers

Section titled “Publish a query to Unity Catalog for external consumers”

“Create a persistent SQL view of active customers that dashboards can query directly”

CREATE VIEW catalog.schema.active_customers
AS SELECT * FROM customers WHERE status = 'active';

Persistent views run the query on each read — nothing is stored physically. Use them when external consumers need a stable interface to filtered or transformed data without materializing the result.

“Use a temporary view to clean and enrich raw orders before applying SCD Type 1 deduplication, in Python”

@dp.temporary_view()
def orders_cleaned():
return (
spark.readStream.table("bronze.orders")
.filter(col("order_id").isNotNull())
.withColumn("order_total", col("quantity") * col("unit_price"))
)
dp.create_streaming_table(name="orders_current")
dp.create_auto_cdc_flow(
target="orders_current",
source="orders_cleaned",
keys=["order_id"],
sequence_by="order_date",
stored_as_scd_type="1"
)

Temporary views are the standard way to insert transformation logic before an Auto CDC flow. The view runs during pipeline execution and feeds directly into the CDC target — no intermediate table to manage or pay storage for.

  • Temporary views are invisible outside the pipeline — they don’t appear in Unity Catalog and can’t be queried by notebooks, dashboards, or other pipelines. If something else needs the result, use a streaming table or materialized view instead.
  • Persistent views are SQL onlyCREATE VIEW catalog.schema.name works in SQL pipeline files. Python pipelines don’t have an equivalent decorator for persistent views.
  • Don’t materialize what should be a view — if the logic is lightweight filtering or renaming and nothing outside the pipeline reads it, a temporary view is the right choice. A streaming table or materialized view for the same purpose wastes storage and adds unnecessary refresh overhead.