[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan#5535
Open
tagirb wants to merge 1 commit intodatabricks:mainfrom
Open
[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan#5535tagirb wants to merge 1 commit intodatabricks:mainfrom
tagirb wants to merge 1 commit intodatabricks:mainfrom
Conversation
6 tasks
For large IAM deployments (thousands of databricks_group_member,
databricks_group, databricks_user, and mws_permission_assignment
resources), terraform plan was issuing one API call per resource Read —
resulting in thousands of redundant requests and 429 rate-limit errors.
Each resource type now uses a shared in-memory cache backed by
sync.RWMutex (concurrent warm reads) + singleflight (deduplicated
cold-cache fetches). A single list call populates the cache for all
resources of that type; subsequent reads are served from memory.
Mutations (Create/Update/Delete) invalidate the relevant cache entry.
Changes:
- access/resource_permission_assignment.go: permAssignmentCache
(1 List call per host instead of 1 per resource)
- mws/resource_mws_permission_assignment.go: workspaceAssignmentsCache
(1 ListByWorkspaceId per workspace_id instead of 1 per resource)
- scim/resource_group.go + scim/groups.go: groupsListCache
(1 GET /Groups?attributes=id,... instead of N GET /Groups/{id})
- scim/resource_user.go + scim/users.go: usersListCache
(1 GET /Users?attributes=id,... instead of N GET /Users/{id})
- scim/resource_group_member.go: bulk groupCache
(1 GET /Groups?attributes=id,members instead of N per-group reads,
eliminating concurrent request storms that caused 429 errors)
94a1d73 to
6f0c9e5
Compare
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Performance] Cache SCIM and permission-assignment list calls to reduce API requests during plan
For large IAM deployments,
terraform planwas issuing one individual API call per resource Read —e.g. 620
GET /Users/{id}calls for 620databricks_userresources, 9,001 per-group memberlookups for 9,001
databricks_group_memberresources, and so on. Under high concurrency this alsocaused 429 rate-limit errors.
Root cause: Each resource's
Readhandler called the API independently with no sharing betweenparallel goroutines.
Fix: Each affected resource type now has a shared in-memory cache populated by a single list
call. All goroutines for resources of the same type share the one result. Mutations
(Create/Update/Delete) invalidate the cache so the next Read re-fetches fresh data.
Concurrency safety: Caches use
sync.RWMutex(concurrent reads on a warm cache) combined withgolang.org/x/sync/singleflight(deduplicating in-flight cold-cache fetches so exactly onegoroutine calls the API while all others wait and share the result).
Per-resource changes:
databricks_mws_permission_assignment(~4,400)ListByWorkspaceIdper resourceworkspace_iddatabricks_permission_assignmentListper resourcedatabricks_group(~1,466)GET /Groups/{id}per resourceGET /Groups?attributes=id,...totaldatabricks_user(~620)GET /Users/{id}per resourceGET /Users?attributes=id,...totaldatabricks_group_member(~9,001)GET /Groups/{id}?attributes=members→ 429sGET /Groups?attributes=id,memberstotalAll caches fall back to individual reads for resources absent from the list response (e.g. created
concurrently after the cache was populated).
Changes
Shared in-memory list caches for five IAM resource types, reducing total SCIM/IAM API calls during
a plan cycle from O(N) to O(1) per resource type. No schema, provider interface, or user-facing
behaviour changes.
Tests
make testrun locallydocs/folder — no user-facing schema or behaviour changes; existing resource documentation remains accurateinternal/acceptance— existing acceptance tests inscim/group_test.go,scim/user_test.go,scim/group_member_test.gocover full CRUD paths for all changed resourcesclient.Scim()helper, not the Go SDK IAM clientdatabricks_group,databricks_user,databricks_group_member,databricks_permission_assignment,databricks_mws_permission_assignment) are SDKv2 resourcesNEXT_CHANGELOG.mdfile