Real-Time Analytics
Overview
Section titled “Overview”Build a real-time analytics system that ingests events via Zerobus gRPC, processes them through Spark Structured Streaming, materializes aggregates in Lakebase for low-latency serving, and exposes the data through a Genie Space for natural-language exploration.
Skills used: databricks-zerobus-ingest, databricks-spark-structured-streaming, databricks-lakebase-autoscale, databricks-genie
MCP tools used: execute_sql, get_table_details, execute_databricks_command, create_or_update_lakebase_branch, create_or_update_lakebase_sync, create_or_update_genie, ask_genie
Prerequisites
Section titled “Prerequisites”- A Databricks workspace with Unity Catalog enabled
- A target table for raw event ingestion (e.g.
main.streaming.raw_events) - A running cluster for Spark Structured Streaming jobs
- A SQL warehouse for the Genie Space
-
Set up the Zerobus Ingest producer
Create a gRPC client that streams events directly into a Delta table in near real-time.
Build a Python Zerobus Ingest producer that streams click events directlyinto the Delta table main.streaming.raw_events using gRPC. Include fieldsfor event_id, user_id, event_type, page_url, and timestamp. Add propererror handling and retry logic. -
Build the streaming processing pipeline
Process raw events with windowed aggregations and stateful transformations.
Write a Spark Structured Streaming job that:1. Reads from main.streaming.raw_events using Change Data Feed2. Applies a 10-minute watermark on the timestamp column3. Computes event counts per 5-minute tumbling window grouped by event_type4. Writes the windowed aggregates to main.streaming.event_aggregatesUse an availableNow trigger for cost-efficient micro-batch processing. -
Create a Lakebase database for low-latency serving
Set up Lakebase Autoscale and sync aggregated data for application access.
Create a new Lakebase Autoscaling project called "realtime-analytics" withscale-to-zero enabled. Then set up a reverse ETL sync frommain.streaming.event_aggregates to the Lakebase database so downstreamapps always have the latest aggregated data. -
Create a Genie Space for business exploration
Let business users ask questions about the streaming data in natural language.
Create a Genie Space called "Event Analytics" that connects tomain.streaming.raw_events and main.streaming.event_aggregates withsample questions:"How many events happened in the last hour?","What are the top event types today?","Show me the event trend for the last 24 hours by 15-minute intervals". -
Test the end-to-end flow
Verify events flow from ingestion through to the Genie Space.
Ask my Genie Space "Event Analytics": Show me the event count trend forthe last 6 hours broken down by event_type. Then ask: Which pages had themost click events today?
What You Get
Section titled “What You Get”- Zerobus Ingest producer streaming events into Delta tables via gRPC
- Spark Structured Streaming job with windowed aggregations and checkpointing
- Lakebase Autoscale database with reverse ETL sync for low-latency app queries
- Genie Space for business users to explore event data in natural language
Next Steps
Section titled “Next Steps”- Add a Streamlit app that reads from Lakebase for a real-time dashboard
- Build an AI/BI Dashboard with auto-refreshing visualizations
- Set up Data Quality Monitoring to alert on event schema drift
- Use SDP pipelines for a managed alternative to raw Structured Streaming