Skip to content

Add vector_search_indexes resource (direct engine)#5123

Open
janniklasrose wants to merge 47 commits into
mainfrom
janniklasrose/vs-index
Open

Add vector_search_indexes resource (direct engine)#5123
janniklasrose wants to merge 47 commits into
mainfrom
janniklasrose/vs-index

Conversation

@janniklasrose
Copy link
Copy Markdown
Contributor

@janniklasrose janniklasrose commented Apr 29, 2026

Changes

Adds vector_search_indexes as a first-class DABs resource on the direct engine, alongside the existing vector_search_endpoints. Direct engine only — vector search has no Terraform provider.

resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
  vector_search_indexes:
    my_index:
      name: main.default.my_index
      endpoint_name: ${resources.vector_search_endpoints.my_endpoint.name}
      primary_key: id
      index_type: DELTA_SYNC
      delta_sync_index_spec:
        source_table: main.default.source
        pipeline_type: TRIGGERED
      grants:
        - principal: data-engineers
          privileges: [SELECT]

What's included:

  • Resource model in bundle/config/resources/vector_search_index.go (with grants) and bundle/direct/dresources/vector_search_index.go (state, lifecycle, drift classification). RemapState round-trips index_subtype so a populated remote subtype isn't classified as drift on the next plan.
  • UC grants wired through the generic grants path with securable type table.
  • recreate_on_changes for immutable spec fields (name, endpoint_name, index_type, index_subtype, primary_key, delta_sync_index_spec, direct_access_index_spec); delta_sync_index_spec.columns_to_sync marked ignore_remote_changes (request-only field — see follow-up note below). The index API has no rename or update path, so any config-side change has to round-trip through delete + create.
  • Index orphaning detection: index state persists the endpoint_uuid of the endpoint it was created against. DoRead looks up the current endpoint UUID by name; if the endpoint was deleted out-of-band the lookup returns "" and OverrideChangeDesc classifies the saved-vs-remote mismatch as Recreate. Builds on the endpoint UUID persistence merged in Persist endpoint UUID for vector_search_endpoints drift detection #5127.
  • Async delete handling: new optional WaitAfterDelete adapter method (sibling to WaitAfterCreate / WaitAfterUpdate). For VS indexes it polls GetIndex until 404 (15-minute cap). apply.Recreate runs DoDelete → DeleteState → WaitAfterDelete → DoCreate → SaveState → WaitAfterCreate, so a wait-time failure leaves the bundle consistent. Replaces the prior SaveState("", nil, nil) placeholder that produced invalid state: empty id planning failures on partial recreate.
  • Destructive-action prompt for VS indexes in bundle/phases/. The message intentionally covers both Delta Sync ("re-runs the embedding pipeline") and Direct Access ("upserted vectors lost") in one paragraph — picking a type-specific message from the bundle config would be wrong on type changes (DELTA_SYNCDIRECT_ACCESS recreates would describe the destination type while the actual teardown is of the source type).
  • Dev-mode name prefixing for indexes prefixes only the leaf component of catalog.schema.name, since catalog and schema are external references (the previous behavior produced invalid names like dev_jan_main.default.my_index). The mutator skips names that still carry literal ${...} tokens, since the leaf split would otherwise inject the prefix inside the trailing ref expression itself.
  • Testserver enforces endpoint existence on index create. Index status returns Ready: true immediately, matching the convention used by every other slow resource the testserver fakes (endpoints → ONLINE, database instances → AVAILABLE, apps → RUNNING).

index_type / spec-block consistency is intentionally not validated client-side — the CreateIndex API rejects mismatched combinations at deploy time, and replicating that check in DABs would just duplicate backend logic.

Why

The direct engine recently gained vector_search_endpoints (#4887). This PR extends the support to indexes, which were the missing half. Along the way it surfaces and fixes a number of issues:

  • Without persisted endpoint UUIDs, identity drift was undetectable. An index pointing at a deleted-and-recreated endpoint would appear live by name but its backing endpoint was gone, leading to confusing "index already exists" errors on subsequent deploys. Persist endpoint UUID for vector_search_endpoints drift detection #5127 added the same UUID tracking on the endpoint side; this PR mirrors it on the index side so the orphan is caught.
  • The async deletion model isn't documented in the SDK, but recreate deploys hit it every time. Without a wait, every recreate failed on the immediate Create.
  • apply.Recreate was writing a malformed empty-ID state entry as its "delete state" step, which then poisoned the next plan with invalid state: empty id.
  • Recreating a VS index is genuinely expensive — Delta Sync re-runs the full embedding pipeline; Direct Access loses every upserted vector. The destructive-action prompt now reflects that.

Follow-ups

  • delta_sync_index_spec.columns_to_sync is request-only in the SDK today: the field is accepted on Create but the Get response doesn't echo it back, which is why we mark it ignore_remote_changes here. There's an open backend PR to expose columns_to_sync on the read path; once the SDK is regenerated against that, we can drop the ignore_remote_changes entry and let normal drift detection handle the field.
  • vector_search_endpoints.budget_policy_id drift (effective vs. requested) and the SDK doc-comment for vector_search_endpoints.usage_policy_id are intentionally not in this PR — both will be addressed by the next SDK bump and the corresponding ./task generate-schema regen.

Tests

  • ./task fmt, ./task checks, ./task lint — all clean.
  • ./task test — unit tests green across bundle/....
  • New unit test TestVectorSearchIndexNameWithUnresolvedRefsLeftAlone in apply_target_mode_test.go exercises the leaf-prefix skip on ${var.catalog}.${var.schema}.${var.index}.
  • New acceptance directories under acceptance/bundle/resources/vector_search_indexes/: basic, drift/columns_to_sync, drift/deleted_remotely, drift/orphaned_endpoint, recreate/index_type, recreate/mixed_types, grants/select.
  • The recreate request log (recreate/index_type/out.requests.recreate.direct.json) captures GET → DELETE → GET → POST with --get enabled in print_requests.py. The middle GET is the WaitAfterDelete poll; if a future change drops the wait the regenerated capture loses that line and the test fails.
  • acceptance/bundle/validate/presets_name_prefix covers the leaf-only name prefix on a 3-part index name.
  • acceptance/bundle/invariant/configs/vector_search_index.yml.tmpl exercises the resource through the invariant matrix; the testserver enforces endpoint existence on index create.
  • Live tested with --profile tmp against staging across initial deploy / drift / recreate / destroy.

This PR was written by Claude Code.

@janniklasrose janniklasrose marked this pull request as draft April 29, 2026 13:12
@janniklasrose janniklasrose force-pushed the janniklasrose/vs-index branch from 1943af9 to 87018ce Compare April 29, 2026 13:19
janniklasrose added a commit that referenced this pull request Apr 30, 2026
)

## Changes

Persist `endpoint_uuid` in state and detect identity drift on
`vector_search_endpoints`.

The endpoint name is stable but its UUID changes if the endpoint is
deleted and recreated by name (e.g. via the workspace UI). Without
persisting the UUID:

- The bundle silently rebound permissions to a different backing
endpoint without recreating the endpoint resource.
- Anything else referencing `endpoint_uuid` (most importantly the
permissions object_id, but also indexes added on top in the next PR)
raced the recreate.

`VectorSearchEndpointState` now embeds `vectorsearch.CreateEndpoint` and
adds `EndpointUuid`. `DoCreate` records the UUID from the create
response; `DoUpdate` copies it from `entry.RemoteState` so unrelated
updates (e.g. `min_qps`) don't blank it out. `OverrideChangeDesc`
classifies `endpoint_uuid` drift as `Recreate` when saved differs from
remote, `Skip` otherwise.

`drift/recreated_same_name` flips from a "badness snapshot" (which
captured the old behavior of permissions silently rebinding) to the
recreate behavior, with a permissions block on the endpoint to verify
the cascade rebinds correctly.

`drift/min_qps/out.plan.direct.json` regenerates to include the new
`endpoint_uuid` skip entry in the detailed plan.

## Why

Splitting this out of the larger `vector_search_indexes` PR
([#5123](#5123)) so it can land
independently. The index PR builds on the persisted UUID for orphan
detection, but the endpoint UUID work stands on its own and is useful
regardless.

## Tests

- `make fmtfull`, `make checks`, `make lintfull` — clean.
- `make test` — green (`libs/apps/runlocal` needed `NODE_OPTIONS=` for
the harness leak; unrelated). `bundle/internal/schema
TestRequiredAnnotationsForNewFields` panics, which is failing on `main`
for unrelated reasons.
- `go test ./acceptance -run
'TestAccept/bundle/resources/vector_search_endpoints'` — all green,
including the flipped `drift/recreated_same_name`.

_This PR was written by Claude Code._
@janniklasrose janniklasrose force-pushed the janniklasrose/vs-index branch from 2b22f02 to 44ade3f Compare April 30, 2026 14:23
Hardcode the 3-part index name so the diff against main is purely
vector_search additions. The test still demonstrates leaf-only
prefixing on a 3-part identifier; the cross-resource reference path is
covered elsewhere.

Co-authored-by: Isaac
RemapState was hardcoding IndexSubtype to the empty string, which would
classify any remote with a populated subtype as drift on the next plan
and force a needless recreate. Pass through remote.IndexSubtype like
the other read-back fields.

Co-authored-by: Isaac
The Vector Search index API has no rename or update path, so any
config-side change has to round-trip through delete + create. Add
name and index_subtype to recreate_on_changes so the planner picks
them up the same way it already does for endpoint_name, index_type,
primary_key, and the spec blocks.

Co-authored-by: Isaac
The leaf-prefix logic splits on the last dot in the 3-part UC name and
prepends the user prefix to whatever follows. If the name still has
literal ${...} tokens (e.g. ${var.catalog}.${var.schema}.${var.index}),
that split lands inside the trailing ref expression and rewrites the
variable name itself. Detect unresolved refs and bail; users who want
the dev prefix in this case can compose it into the variable.

Co-authored-by: Isaac
CreateIndex rejects any combination where the spec block doesn't match
the index_type (e.g. DELTA_SYNC with direct_access_index_spec set, or
DIRECT_ACCESS with neither block at all). Add a fast validator that
reports those mismatches at validate time so the failure surfaces
before the deploy starts running.

Co-authored-by: Isaac
The out.test.toml format changed in #5146 ("acc: Format out.test.toml
in diff-friendly and copypaste-friendly way"), and refschema picked
up index_subtype and endpoint_uuid from the resource model. Pure
regen from running ./task generate-refschema and ./task test-update.

Co-authored-by: Isaac
Previously lookupEndpointUuid swallowed all non-404 errors and returned "",
which would feed empty remoteUuid into OverrideChangeDesc and propose a
destructive Recreate ("endpoint replaced out-of-band") on transient or
permission errors. The Recreate is dangerous: Delta Sync re-runs the
embedding pipeline, and Direct Access loses all upserted vectors.

Now the helper returns (string, error): 404 maps to ("", nil) — the orphan
signal — and any other error is propagated through DoRead/DoCreate so the
plan fails loudly instead of misclassifying it as drift.

Document the OverrideChangeDesc divergence from vector_search_endpoint
(which requires remoteUuid != ""): for indexes, an empty remoteUuid is the
orphan signal, and the lookup contract guarantees that case is unambiguous.
Add a Badness-marked test that deploys a bundle with both a
vector_search_endpoint and a vector_search_index referencing it, then
changes the endpoint_type to trigger an endpoint Recreate. The plan
correctly recreates the endpoint but leaves the dependent index
unchanged, so on a real workspace the endpoint delete would either
fail (indexes still attached) or orphan the index.

Root cause is in the planner (bundle/direct/bundle_plan.go): there is
no logic to propagate Recreate from a dependency to its dependents.
This is a framework-level concern that affects more than just VS,
so it's deferred to a follow-up. The Badness entry documents the gap.
Add a Badness-marked validate test showing that the name_prefix preset
does not rewrite a vector_search_indexes.*.endpoint_name literal that
points at a bundle-managed (and therefore prefixed) endpoint. The output
shows vs_endpoint -> prefix_vs_endpoint while vs_index_literal still
targets the unprefixed name vs_endpoint.

The DABs idiom is to use ${resources.vector_search_endpoints.X.name}
(captured by vs_index_ref in the same fixture). That form resolves
correctly to the prefixed name at plan/deploy time, so users have a
working pattern. The literal form silently breaks though, and the
preset has enough information to rewrite it; tracked as Badness for a
follow-up fix in apply_presets.go.
Mirror the existing vector_search_endpoint bind test: pre-create both
endpoint and index, bind the index into the bundle, deploy, unbind, and
destroy. Verifies the index survives unbind+destroy as expected.

Required by bundle/direct/dresources/README.md for new resource types.
Drop the Terraform-provider justification (already implied by
"direct engine only") and the long list of internal mechanics.
Keep the entry focused on what customers see.
CreateIndex returns immediately with metadata of an index whose embedding
pipeline is still provisioning; queries against an index that isn't ready
fail. Implement WaitAfterCreate so dependent resources (and the next plan)
see a usable index. 75-minute timeout matches the terraform provider.

Co-authored-by: Isaac
The SaveState->DeleteState change in apply.Recreate and the empty-id
tolerance in bundle_plan.go were extracted to a separate PR (#5173).
Reverting them here so this branch and #5173 can land independently;
once #5173 merges, a rebase on main brings the same fix back in.

Co-authored-by: Isaac
Previously most vector_search_indexes tests created the endpoint
out-of-band via the CLI and only declared the index in the bundle.
Move the endpoint into the same databricks.yml so the index can
reference it via ${resources.vector_search_endpoints.my_endpoint.name},
matching the pattern users will write and shrinking the script's
manual cleanup. Bundle destroy now tears down both resources.

Co-authored-by: Isaac
Vector search indexes have no update API. Previously DoUpdate was a
no-op, which meant a future SDK field that wasn't declared in
recreate_on_changes/ignore_remote_changes would be classified as
Update by the planner and silently no-op at deploy time.

Drop the no-op DoUpdate so the framework's existing check at
bundle_plan.go errors loudly ("resource does not support update
action but plan produced update") if a plan ever produces Update
for this resource. Add a reflection-based unit test that catches
the same gap earlier, mirroring the pattern in app_test.go.

Co-authored-by: Isaac
This reverts commit b8483e7d82eadd2bb15f126a25d786bd402f829a.
Main reverted vector_search_endpoints UUID persistence in #5193, so
the endpoint plan no longer carries a synthetic endpoint_uuid change
to be classified as Skip via OverrideChangeDesc. Regenerate the
with_endpoint plan output to match.

Co-authored-by: Isaac
The 3-part UC name (catalog.schema.index) is the API primary key:
CreateIndex addresses by name and DoCreate returns it as the
deployment id. Prefixing it changed which remote object the bundle
addressed, not just its display label. Mirrors #5209's same change
for vector_search_endpoints.

Drop the leaf-only prefix loop and the vectorSearchIndexPrefixPos
helper in apply_presets.go, add VectorSearchIndex to the
no-rename carve-out in apply_target_mode_test.go, and remove the
now-obsolete TestVectorSearchIndexNamePrefixing.

Co-authored-by: Isaac
… remote

- WaitAfterCreate now takes id per #5258; the saved config.Name is the
  same as id, so the body is unchanged.
- SDK v0.132.0 (#5237) returns delta_sync_index_spec.columns_to_sync (and
  the new columns_to_index field) on read. Drop the ignore_remote_changes
  rule and propagate both from remote in RemapState. Removes the
  drift/columns_to_sync acceptance test which was asserting the now-stale
  request-only behavior.

Co-authored-by: Isaac
Per denik's PR comment: explain that ForceSendFields is an SDK
marshaling concern (which zero-valued fields to wire-serialize)
that has no meaning on the read path, so copying it from the
response struct would not be useful.

Co-authored-by: Isaac
The test was a Badness fixture capturing the gap where a literal
endpoint_name on a VS index would not follow the endpoint's name
prefix. Now that neither VS endpoints (#5209) nor VS indexes are
prefixed, the literal form correctly points at the (unprefixed)
endpoint, and all three branches of the fixture produce identical
output.

Co-authored-by: Isaac
generate-schema picked up the missing placeholder for index_subtype
after the SDK bump; previously this field wasn't in the resource and
the schema_test caught the gap on rebase.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants