Skip to content

[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan#5535

Open
tagirb wants to merge 1 commit intodatabricks:mainfrom
tagirb:add-scim-caching
Open

[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan#5535
tagirb wants to merge 1 commit intodatabricks:mainfrom
tagirb:add-scim-caching

Conversation

@tagirb
Copy link
Copy Markdown

@tagirb tagirb commented Mar 27, 2026

[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan

For large IAM deployments, terraform plan was issuing one individual API call per resource Read —
e.g. 620 GET /Users/{id} calls for 620 databricks_user resources, 9,001 per-group member
lookups for 9,001 databricks_group_member resources, and so on. Under high concurrency this also
caused 429 rate-limit errors.

Root cause: Each resource's Read handler called the API independently with no sharing between
parallel goroutines.

Fix: Each affected resource type now has a shared in-memory cache populated by a single list
call. All goroutines for resources of the same type share the one result. Mutations
(Create/Update/Delete) invalidate the cache so the next Read re-fetches fresh data.

Concurrency safety: Caches use sync.RWMutex (concurrent reads on a warm cache) combined with
golang.org/x/sync/singleflight (deduplicating in-flight cold-cache fetches so exactly one
goroutine calls the API while all others wait and share the result).

Per-resource changes:

Resource Before After
databricks_mws_permission_assignment (~4,400) 1 ListByWorkspaceId per resource 1 per workspace_id
databricks_permission_assignment 1 List per resource 1 per host
databricks_group (~1,466) 1 GET /Groups/{id} per resource 1 GET /Groups?attributes=id,... total
databricks_user (~620) 1 GET /Users/{id} per resource 1 GET /Users?attributes=id,... total
databricks_group_member (~9,001) up to 1,466 concurrent GET /Groups/{id}?attributes=members → 429s 1 GET /Groups?attributes=id,members total

All caches fall back to individual reads for resources absent from the list response (e.g. created
concurrently after the cache was populated).

Changes

Shared in-memory list caches for five IAM resource types, reducing total SCIM/IAM API calls during
a plan cycle from O(N) to O(1) per resource type. No schema, provider interface, or user-facing
behaviour changes.

Tests

  • make test run locally
  • relevant change in docs/ folder — no user-facing schema or behaviour changes; existing resource documentation remains accurate
  • covered with integration tests in internal/acceptance — existing acceptance tests in scim/group_test.go, scim/user_test.go, scim/group_member_test.go cover full CRUD paths for all changed resources
  • using Go SDK — not applicable; all changed resources use the SDKv2 client.Scim() helper, not the Go SDK IAM client
  • using TF Plugin Framework — not applicable; all changed resources (databricks_group, databricks_user, databricks_group_member, databricks_permission_assignment, databricks_mws_permission_assignment) are SDKv2 resources
  • has entry in NEXT_CHANGELOG.md file

@tagirb tagirb requested review from a team as code owners March 27, 2026 14:45
@tagirb tagirb requested review from renaudhartert-db and removed request for a team March 27, 2026 14:45
For large IAM deployments (thousands of databricks_group_member,
databricks_group, databricks_user, and mws_permission_assignment
resources), terraform plan was issuing one API call per resource Read —
resulting in thousands of redundant requests and 429 rate-limit errors.

Each resource type now uses a shared in-memory cache backed by
sync.RWMutex (concurrent warm reads) + singleflight (deduplicated
cold-cache fetches). A single list call populates the cache for all
resources of that type; subsequent reads are served from memory.
Mutations (Create/Update/Delete) invalidate the relevant cache entry.

Changes:
- access/resource_permission_assignment.go: permAssignmentCache
  (1 List call per host instead of 1 per resource)
- mws/resource_mws_permission_assignment.go: workspaceAssignmentsCache
  (1 ListByWorkspaceId per workspace_id instead of 1 per resource)
- scim/resource_group.go + scim/groups.go: groupsListCache
  (1 GET /Groups?attributes=id,... instead of N GET /Groups/{id})
- scim/resource_user.go + scim/users.go: usersListCache
  (1 GET /Users?attributes=id,... instead of N GET /Users/{id})
- scim/resource_group_member.go: bulk groupCache
  (1 GET /Groups?attributes=id,members instead of N per-group reads,
  eliminating concurrent request storms that caused 429 errors)
@tagirb tagirb force-pushed the add-scim-caching branch from 94a1d73 to 6f0c9e5 Compare March 27, 2026 15:00
@github-actions
Copy link
Copy Markdown

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/terraform

Inputs:

  • PR number: 5535
  • Commit SHA: 6f0c9e51a6ea82d65df7ac44a577933deab12e19

Checks will be approved automatically on success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant