Data Engineering

Build production data pipelines — from streaming ingestion to custom Spark connectors to near real-time delivery.

Spark Declarative Pipelines Medallion-architecture pipelines with streaming tables, materialized views, and Auto Loader.

Spark Structured Streaming Production streaming with Kafka, stateful operations, watermarks, and multi-sink writes.

Custom Spark Data Sources Python data sources for connecting Spark to external systems via the PySpark DataSource API.

Zerobus Ingest Near real-time ingestion into Delta tables via gRPC — no message bus required.