ML Ops Pipeline
Overview
Section titled “Overview”Build a complete ML Ops pipeline from data generation through production monitoring. Generate synthetic training data, train and log a model with MLflow, evaluate it with GenAI scorers, deploy the best version to a serving endpoint, and track prediction quality with a monitoring dashboard.
Skills used: databricks-synthetic-data-gen, databricks-mlflow-evaluation, databricks-model-serving, databricks-aibi-dashboards
MCP tools used: execute_sql, execute_databricks_command, query_serving_endpoint, list_serving_endpoints, get_serving_endpoint_status, create_or_update_dashboard
Prerequisites
Section titled “Prerequisites”- A Databricks workspace with Unity Catalog and Model Serving enabled
- A catalog and schema for ML artifacts (e.g.
main.ml_demo) - A cluster with MLflow and scikit-learn available
- A SQL warehouse for the monitoring dashboard
-
Generate synthetic training data
Create a realistic dataset for model training and evaluation.
Generate 100K rows of synthetic e-commerce transaction data with columns forcustomer_id, product_category, quantity, price, discount_pct, day_of_week,is_returning_customer, and a binary label "is_high_value" (1 if total > $200).Write it to main.ml_demo.training_data as a Delta table. Include realisticdistributions and correlations between features. -
Train and log the model with MLflow
Build a classifier, log it to MLflow, and register in Unity Catalog.
Write a Python script that:1. Reads main.ml_demo.training_data2. Splits into train/test sets (80/20)3. Trains a scikit-learn gradient boosting classifier to predict is_high_value4. Logs the model, parameters, and metrics to MLflow5. Registers the model as main.ml_demo.high_value_predictor in Unity CatalogRun it on my Databricks cluster. -
Evaluate model quality
Score the model with MLflow evaluation to validate performance before deployment.
Run mlflow.genai.evaluate() on my model using a test dataset of 500 samples.Score with built-in Correctness and Guidelines scorers. Log results to anMLflow experiment called "high_value_predictor_eval" and show me theprecision, recall, and F1 metrics. -
Deploy to a serving endpoint
Put the model into production with auto-scaling.
Deploy my MLflow model from Unity Catalog(models:/main.ml_demo.high_value_predictor/1) to a serving endpoint called"high-value-predictor" with auto-scaling from 0 to 4 instances. Wait for itto be ready and test with a sample prediction. -
Build a monitoring dashboard
Track prediction quality and volume over time.
Create an AI/BI dashboard called "ML Model Monitor" with:- A counter showing total predictions served today- A line chart of prediction volume over the last 30 days- A bar chart of prediction distribution (high_value vs not) by day- A table showing the latest 50 predictions with their confidence scoresSource the data from the serving endpoint's inference logs.
What You Get
Section titled “What You Get”- Training dataset with 100K synthetic transactions in
main.ml_demo.training_data - Trained model logged in MLflow and registered in Unity Catalog
- Evaluation results with precision, recall, F1, and custom scorer outputs
- Serving endpoint with auto-scaling and test predictions verified
- Monitoring dashboard tracking prediction volume, distribution, and recent results
Next Steps
Section titled “Next Steps”- Add Data Quality Monitoring on the training data table to detect drift
- Set up a Databricks Job for automated retraining on a weekly schedule
- Use Vector Search to add feature lookup for real-time enrichment
- Package the full pipeline as a Databricks Asset Bundle for CI/CD deployment
- Add Lakebase to store prediction results for low-latency application access