Views
Skill: databricks-spark-declarative-pipelines
What You Can Build
Section titled “What You Can Build”Views in Spark Declarative Pipelines come in two forms. Temporary views exist only during pipeline execution — use them for intermediate filtering, shared logic, or preprocessing before Auto CDC. Persistent views get published to Unity Catalog so dashboards and external queries can reference them directly.
In Action
Section titled “In Action”“Create a Python temporary view that filters test events out of the raw stream before downstream tables consume it”
@dp.temporary_view()def filtered_events(): return ( spark.readStream.table("raw_events") .filter("event_type != 'test'") )Key decisions:
- Temporary view over a streaming table — this intermediate filter doesn’t need to be persisted or queried outside the pipeline. A temporary view avoids storage costs and keeps the catalog clean.
- Streaming read inside a view — temporary views can wrap either batch or streaming reads. Downstream tables inherit the read mode, so a streaming table reading from this view still gets incremental processing.
More Patterns
Section titled “More Patterns”Publish a query to Unity Catalog for external consumers
Section titled “Publish a query to Unity Catalog for external consumers”“Create a persistent SQL view of active customers that dashboards can query directly”
CREATE VIEW catalog.schema.active_customersAS SELECT * FROM customers WHERE status = 'active';Persistent views run the query on each read — nothing is stored physically. Use them when external consumers need a stable interface to filtered or transformed data without materializing the result.
Preprocess data before Auto CDC
Section titled “Preprocess data before Auto CDC”“Use a temporary view to clean and enrich raw orders before applying SCD Type 1 deduplication, in Python”
@dp.temporary_view()def orders_cleaned(): return ( spark.readStream.table("bronze.orders") .filter(col("order_id").isNotNull()) .withColumn("order_total", col("quantity") * col("unit_price")) )
dp.create_streaming_table(name="orders_current")
dp.create_auto_cdc_flow( target="orders_current", source="orders_cleaned", keys=["order_id"], sequence_by="order_date", stored_as_scd_type="1")Temporary views are the standard way to insert transformation logic before an Auto CDC flow. The view runs during pipeline execution and feeds directly into the CDC target — no intermediate table to manage or pay storage for.
Watch Out For
Section titled “Watch Out For”- Temporary views are invisible outside the pipeline — they don’t appear in Unity Catalog and can’t be queried by notebooks, dashboards, or other pipelines. If something else needs the result, use a streaming table or materialized view instead.
- Persistent views are SQL only —
CREATE VIEW catalog.schema.nameworks in SQL pipeline files. Python pipelines don’t have an equivalent decorator for persistent views. - Don’t materialize what should be a view — if the logic is lightweight filtering or renaming and nothing outside the pipeline reads it, a temporary view is the right choice. A streaming table or materialized view for the same purpose wastes storage and adds unnecessary refresh overhead.