Filling Gaps Across Skills
Skill: databricks-docs
What You Can Build
Section titled “What You Can Build”Other AI Dev Kit skills handle orchestration, execution, and workspace operations — but none of them carry a full copy of the Databricks documentation. When a skill gives you a method signature but not the constraint that breaks your use case, or when you need to verify that a feature works the way you think it does before deploying to production, a doc lookup closes that gap. The databricks-docs skill slots into any workflow where authoritative reference matters more than speed.
In Action
Section titled “In Action”“I used the jobs skill to create a task with
python_wheel_task, but it’s failing with an import error even though the wheel path looks right. Check the Databricks docs for what I’m missing on Python wheel tasks.”
1. Fetch https://docs.databricks.com/llms.txt2. Search for "python wheel task" and "libraries"3. Fetch the Jobs API / task types reference4. Return: python_wheel_task requires the wheel listed separately in the task's `libraries` field — the wheel_path alone is not enoughThe jobs skill generates correct task structure, but the library attachment requirement is a constraint the skill doesn’t encode. The doc lookup surfaces it immediately: python_wheel_task with no libraries entry means the cluster has no wheel to import from, which manifests as a ModuleNotFoundError at runtime rather than a configuration error at job creation.
Key decisions:
- Trigger a doc lookup when you see a cryptic runtime error, not a config error — if the API accepted your payload but the task still fails, a constraint is missing from the reference material your skill was built from
- Search by error symptom, not just feature name — “wheel task import error” returns more targeted sections than “python wheel task reference”
- Apply the fix back through the skill — once you have the missing piece from docs, return to the original skill to regenerate the corrected configuration rather than patching by hand
More Patterns
Section titled “More Patterns”Verify SDP pipeline behavior before a full-refresh deployment
Section titled “Verify SDP pipeline behavior before a full-refresh deployment”“I’m about to run a full refresh on a Spark Declarative Pipeline in production. Use the docs skill to confirm exactly what gets reprocessed and whether it drops and recreates the output tables.”
1. Fetch https://docs.databricks.com/llms.txt2. Search for "full refresh" under Spark Declarative Pipelines3. Fetch the SDP pipeline modes reference4. Return: full refresh truncates and rewrites all tables from scratch; streaming table checkpoints are reset; this cannot be undoneThe SDP skill generates pipeline definitions but doesn’t surface the operational semantics of refresh modes. Before you trigger a full refresh on a 200GB bronze table in production, confirm what it actually does. The answer — truncate, recompute, no rollback — is the kind of detail that changes whether you run it Friday afternoon or schedule it for the weekend.
Resolve a permissions gap when Genie queries start failing
Section titled “Resolve a permissions gap when Genie queries start failing”“Our Genie space was working fine and now users are getting permission denied on queries. The workspace MCP shows the space is configured correctly. Check the docs for what Unity Catalog privileges Genie requires on the underlying tables.”
1. Fetch https://docs.databricks.com/llms.txt2. Search for "Genie" and "AI/BI" and "permissions"3. Fetch the Genie space permissions reference4. Return: space users need SELECT on every table Genie queries, plus USE SCHEMA and USE CATALOG on the containing objects; the space owner's permissions are not inherited by usersMCP tools can read workspace and Genie configuration, but they can’t tell you why a permission denied error is happening — that requires understanding the Unity Catalog privilege inheritance model. The docs lookup identifies the gap: SELECT on the table without USE SCHEMA on the schema is a silent failure mode that only surfaces at query time.
Confirm SQL warehouse sizing behavior before provisioning
Section titled “Confirm SQL warehouse sizing behavior before provisioning”“I’m setting up a SQL warehouse for our BI team. The workspace MCP can create it, but I want to understand how cluster size and autoscaling interact before I commit to a configuration.”
1. Fetch https://docs.databricks.com/llms.txt2. Search for "SQL warehouse" and "cluster size" and "autoscaling"3. Fetch the SQL warehouse configuration reference4. Return: cluster size controls workers per cluster; min/max clusters controls horizontal scale-out; serverless skips both in favor of automatic capacity managed by DatabricksThe workspace MCP accepts any valid warehouse configuration, but the interaction between cluster_size, min_num_clusters, and max_num_clusters is not obvious. Cluster size is vertical scale (workers per cluster), autoscaling is horizontal scale (number of clusters). Setting a large cluster size with autoscaling disabled is often worse than a smaller size with autoscaling enabled for variable BI workloads.
Check Vector Search index type constraints before building a RAG pipeline
Section titled “Check Vector Search index type constraints before building a RAG pipeline”“I’m building a RAG pipeline using the vector-search skill. Before generating the index creation code, check the docs for which index types support managed embeddings vs. self-managed.”
1. Fetch https://docs.databricks.com/llms.txt2. Search for "Vector Search" and "index type" under AI/ML3. Fetch the Vector Search index types reference4. Return: Delta Sync indexes support both managed and self-managed embeddings; Direct Vector Access indexes only support self-managed; managed embeddings require a Databricks-hosted embedding model endpointThe vector-search skill generates index creation code, but picking the wrong index type means rewriting the implementation after the fact. Checking the constraint first — managed embeddings only work with Delta Sync indexes — takes 30 seconds and prevents a full rebuild after you’ve already loaded your document corpus.
Watch Out For
Section titled “Watch Out For”- Don’t use docs to replace skill output — skills handle the code generation; docs handle the reference validation. Asking docs to generate a full job definition or SQL pipeline is slower and less accurate than using the right skill for the job
- Stale training data affects all skills, not just docs — if a feature was renamed or deprecated after the model’s training cutoff, the doc lookup will return the current name while other skills may still use the old one. The docs skill is your source of truth for current terminology
- Cross-referencing two sections beats reading one — the permissions required for a feature are rarely on the same page as the feature itself. Search for the feature, then search for “permissions” or “privileges” as a separate lookup to get both in scope
- Use doc lookups proactively, not just reactively — the most valuable time to check docs is before you deploy a configuration you’re uncertain about, not after a production incident surfaces the gap