Development & Testing Workflow

Skill: databricks-model-serving

What You Can Build

You can go from a local agent.py file to a tested, working agent on Databricks in minutes. The workflow is write locally, upload, install packages, test, and iterate — each step driven by a single prompt to your AI coding assistant. Catching errors here saves you the 15-minute deployment cycle for every bug.

In Action

“Upload my agent folder to Databricks and test it on a cluster. Use the agent files in ./my_agent/.”

import mlflow
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import ResponsesAgentRequest, ResponsesAgentResponse
from databricks_langchain import ChatDatabricks

LLM_ENDPOINT = "databricks-meta-llama-3-3-70b-instruct"

class MyAgent(ResponsesAgent):
    def __init__(self):
        self.llm = ChatDatabricks(endpoint=LLM_ENDPOINT)

    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        messages = [{"role": m.role, "content": m.content} for m in request.input]
        response = self.llm.invoke(messages)
        return ResponsesAgentResponse(
            output=[self.create_text_output_item(text=response.content, id="msg_1")]
        )

AGENT = MyAgent()
mlflow.models.set_model(AGENT)

Key decisions:

Write locally, run remotely — your AI coding assistant edits agent.py on your machine and pushes it to the workspace for execution. No notebook editing required.
Test with real endpoints — local unit tests catch syntax errors but miss auth issues, missing packages, and endpoint connectivity. Remote testing on a cluster catches all of these.
Keep the project structure flat — agent.py, test_agent.py, log_model.py in one folder. The upload and execution tools work best with a single directory.
Re-upload after every change — workspace files do not auto-sync. Each iteration requires upload then run.

More Patterns

Write a Local Test Script

“Create a test script that validates my agent handles basic requests. Use Python.”

from agent import AGENT
from mlflow.types.responses import ResponsesAgentRequest, ChatContext

request = ResponsesAgentRequest(
    input=[{"role": "user", "content": "What is Databricks?"}],
    context=ChatContext(user_id="test@example.com"),
)

# Non-streaming
result = AGENT.predict(request)
print("Response:", result.model_dump(exclude_none=True))

# Streaming
for event in AGENT.predict_stream(request):
    print(event)

Run this on the cluster after uploading. It imports your agent directly and calls predict, exercising the same code path the serving endpoint will use. If this works, deployment will work.

Install Packages on the Cluster

“Install the MLflow 3 agent packages on my Databricks cluster.”

%pip install -U mlflow==3.6.0 databricks-langchain langgraph==0.3.4 databricks-agents pydantic
dbutils.library.restartPython()

The restartPython() call is mandatory after %pip install. Without it, the new packages are installed but the running Python process still has the old versions loaded.

Verify the Environment

“Check which versions of the agent packages are installed on the cluster. Use Python.”

import pkg_resources

for pkg in ['mlflow', 'langchain', 'langgraph', 'pydantic', 'databricks-langchain']:
    try:
        version = pkg_resources.get_distribution(pkg).version
        print(f"{pkg}: {version}")
    except pkg_resources.DistributionNotFound:
        print(f"{pkg}: NOT INSTALLED")

Run this before testing your agent. Version mismatches between your local pip_requirements and the cluster are the most common source of “works locally, fails remotely” bugs.

Smoke-Test a Foundation Model Endpoint

“Verify that my cluster can reach the foundation model endpoint before testing the full agent. Use Python.”

from databricks_langchain import ChatDatabricks

llm = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct")
response = llm.invoke([{"role": "user", "content": "Hello!"}])
print(response.content)

If this fails with a permission or connectivity error, your agent will fail too. Isolate the problem before debugging your agent code.

Watch Out For

Skipping restartPython() after %pip install — the Python process caches old module versions. You will see stale behavior or import errors until you restart.
Reusing a stale execution context — if you see strange errors after multiple iterations, let your AI coding assistant create a fresh context rather than reusing the old one.
Testing only locally — local tests with mocked endpoints miss auth failures, package version conflicts, and network issues that only surface on the cluster. Always run test_agent.py on Databricks before logging the model.
Forgetting to re-upload after edits — workspace files do not auto-sync from your local machine. Every code change requires uploading the folder again before re-running.