[BB-1755, BB-1771]Add retry logic for not found errors in connector methods and fix missed grants with oauth#32
[BB-1755, BB-1771]Add retry logic for not found errors in connector methods and fix missed grants with oauth#32FeliLucero1 wants to merge 3 commits intomainfrom
Conversation
WalkthroughReplaces string-based group-member parsing with explicit Group fetches (now passing attribute Vars), iterates and parses Changes
Sequence Diagram(s)sequenceDiagram
participant Connector as Connector
participant Databricks as DatabricksClient
participant GrantLogic as GrantProcessor
rect `#F0F8FF`
Connector->>Databricks: GetGroup(ctx, workspace, groupId, NewGroupAttrVars())
note right of Databricks: returns Group with Members[]
Databricks-->>Connector: Group{Members:[{Ref:"Users/123"}, {Ref:"Groups/456"}, ...]}
end
rect `#FFF5E6`
Connector->>GrantLogic: For each member in Group.Members
GrantLogic->>GrantLogic: parse member.Ref -> (type, id) or error
alt member is User or ServicePrincipal
GrantLogic->>GrantLogic: build resource ID, append annotations
else member is Group
GrantLogic->>Connector: expand nested group via groupGrantExpansion(...)
Connector-->>GrantLogic: nested members + annotations
else invalid Ref
GrantLogic--xConnector: return error("invalid member Ref")
end
end
rect `#F6FFF0`
note over Databricks,Connector: Error handling flow
Databricks--xConnector: status.Error(codes.NotFound,...)
Connector->>Connector: makeRetryableIfNotFound(err) -> grpc.Unavailable (retryable)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/connector/groups.go (1)
199-204: Consider consistent error wrapping for GetGroup calls.This
GetGroupcall in theGrantsmethod returns errors directly, while theGrant(line 309) andRevoke(line 453) methods wrap errors withmakeRetryableIfNotFound.If the intent is to make not-found errors retryable only for mutation operations and not for read operations during grant listing, this is fine. Otherwise, consider applying consistent error wrapping across all
GetGroupcalls.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/connector/groups.go(3 hunks)pkg/connector/roles.go(4 hunks)pkg/databricks/client.go(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/connector/roles.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/connector/groups.go (1)
pkg/databricks/vars.go (1)
NewGroupAttrVars(86-94)
pkg/databricks/client.go (1)
pkg/databricks/vars.go (1)
Vars(8-10)
🔇 Additional comments (4)
pkg/databricks/client.go (2)
259-259: LGTM - Backward compatible extension.The
GetGroupsignature has been extended to accept variadicVarsparameters, allowing callers to specify which attributes to fetch (e.g.,databricks.NewGroupAttrVars()for members). This is a backward-compatible change since existing calls without the parameter will continue to work.Also applies to: 272-272
200-200: Verification confirms all callers properly handle the new NotFound errors.All six
Find*methods are called exclusively inpkg/connector/helpers.go, and every call site properly checks for errors and wraps them withmakeRetryableIfNotFound(). There are no other callers in the codebase that would be affected by the breaking change.pkg/connector/groups.go (2)
206-234: LGTM - Clear member parsing logic.The member parsing logic correctly:
- Validates the
member.Refformat ("Type/ID")- Handles all expected member types (Users, Groups, ServicePrincipals)
- Applies group expansion for nested group memberships
- Returns descriptive errors for invalid formats
The refactoring from profile string parsing to explicit API fetching with member iteration is a reliability improvement.
307-310: The implementation is correct and all concerns are addressed.The
makeRetryableIfNotFoundfunction at lines 216-230 inpkg/connector/helpers.go:
- ✓ Correctly identifies both gRPC
NotFound(line 226) and HTTPNotFound(line 222) errors- ✓ Converts both to
codes.Unavailablestatus, which is retryable- ✓ Preserves the original error message for debugging via
fmt.Sprintf(..., err)- ✓ Returns the original error unchanged if it's not a NotFound error (line 230)
The implementation properly handles the error transformation required by the PR's retry logic.
| "google.golang.org/grpc/codes" | ||
| "google.golang.org/grpc/status" |
There was a problem hiding this comment.
🧩 Analysis chain
Verify the gRPC library version.
The imports are necessary for the new error handling, but the library context indicates google.golang.org/grpc version v1.73.0-dev is being used. Development versions can be unstable and are generally not recommended for production use.
Run the following script to check the gRPC version in use:
🏁 Script executed:
#!/bin/bash
# Check the version of google.golang.org/grpc in go.mod
cat go.mod | grep "google.golang.org/grpc"Length of output: 144
**No actionable concerns with the imports; gRPC v1 is stable, but verify dependency management.**
When a project reaches major version v1 it is considered stable. The imports for `google.golang.org/grpc/codes` and `google.golang.org/grpc/status` are appropriate for error handling. While the codebase currently pulls in v1.73.0-dev (a pre-release version), Version 1.73.0 (stable) should be used when available. Since this is marked as an indirect dependency, it's being pulled in by another package—consider verifying your dependency management to use stable releases where possible rather than development versions in production builds.
[resolve_review_comment]
---
🌐 Web query:
```web
google.golang.org/grpc v1.73.0-dev stability production use
💡 Result:
Short answer: don't use a -dev module version in production. The "-dev" suffix denotes a development (unpublished/unstable) build; use an official tagged release (for example v1.73.0 or a later non-dev release) and pin that version after testing. [1][2][3]
Details and sources:
- grpc-go has an official v1.73.0 release (use the tagged release, not a dev build). [1]
- A v1.74.0 tagging issue was reported (the project advised using v1.73.0 until a fixed 1.74.x), illustrating that unstable tags/dev builds can be transient or incorrect. [2]
- The grpc-go repo documentation and releases are the authoritative source for which versions are stable for production; prefer published semver tags, run your tests, and pin the module in go.mod. [3]
If you want, I can look up the latest non-dev grpc-go release right now and suggest the exact module line to put in go.mod.
🌐 Web query:
grpc-go v1.73.0 release date stable version
💡 Result:
grpc-go v1.73.0 (stable) was released on June 5, 2025. [1][2]
Sources:
[1] GitHub Releases — grpc/grpc-go (v1.73.0).
[2] SourceForge mirror — grpc-go v1.73.0.
Address the development version dependency before production use.
The gRPC library version v1.73.0-dev is a development (unstable) build and must not be used in production. The stable release v1.73.0 has been available since June 5, 2025. Update go.mod to pin the stable release instead. Since this is an indirect dependency, investigate which direct dependency is pulling in the -dev version and either update that dependency or explicitly pin google.golang.org/grpc to v1.73.0 in your go.mod.
🤖 Prompt for AI Agents
pkg/databricks/client.go around lines 12-13: your module is pulling an indirect
gRPC dev release (v1.73.0-dev); locate which direct dependency requires it (use
go mod graph or go list -m -json all) and either update that direct dependency
to a version that depends on google.golang.org/grpc v1.73.0 or explicitly pin
google.golang.org/grpc to v1.73.0 in go.mod (add a require directive and run go
mod tidy) so the stable v1.73.0 is used instead of the -dev build.
luisina-santos
left a comment
There was a problem hiding this comment.
we should confirm with core teams this is a safe change based on the number of times the connector will retry the action when the user does not actually exist, causing potential infinite retries.
I want to make sure the retry operation is something that we should handle at connector level, since we only want to retry if app access is configured through delegated provision (and we don't have this context here)
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/connector/groups.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/connector/groups.go (2)
pkg/databricks/vars.go (1)
NewGroupAttrVars(86-94)pkg/connector/account.go (1)
GroupsType(33-33)
🔇 Additional comments (4)
pkg/connector/groups.go (4)
211-239: Member processing logic is well-structured.The refactored approach to parse
member.Refis clear and handles the expected member types (Users, Groups, ServicePrincipals) correctly. Format validation and nested group expansion are properly implemented.
206-209: Rate limiting annotations properly implemented.The rate limiting data from both
GetGroupandListRuleSetsis now correctly captured and accumulated in annotations, which are returned to the caller. This addresses the previous review feedback about readingrateLimitDataand returning proper annotations.Based on learnings
Also applies to: 247-249, 271-271
316-319: Retry logic correctly applied.The
GetGroupcall now includesNewGroupAttrVars()to ensure member data is fetched, and the error is properly wrapped withmakeRetryableIfNotFoundto enable retry behavior for not-found scenarios per the PR objectives.
460-463: Retry logic correctly applied.Consistent with the
Grantmethod, the error handling properly wraps not-found errors to enable retry behavior.
| // Always fetch the group with members attribute to ensure we get the members | ||
| // regardless of authentication type (OAuth vs personal access token) | ||
| group, rateLimitData, err := g.client.GetGroup(ctx, workspaceId, groupId.Resource, databricks.NewGroupAttrVars()) | ||
| if err != nil { | ||
| return nil, "", nil, fmt.Errorf("databricks-connector: failed to get group %s: %w", groupId.Resource, err) | ||
| } |
There was a problem hiding this comment.
Apply retry wrapper consistently with Grant and Revoke methods.
The GetGroup error is not wrapped with makeRetryableIfNotFound here, but the same calls in Grant (line 318) and Revoke (line 462) do wrap it. Per the PR objectives (adding retry logic for not found errors), this method should also benefit from the retry behavior.
Apply this diff:
group, rateLimitData, err := g.client.GetGroup(ctx, workspaceId, groupId.Resource, databricks.NewGroupAttrVars())
if err != nil {
+ err = makeRetryableIfNotFound(err)
return nil, "", nil, fmt.Errorf("databricks-connector: failed to get group %s: %w", groupId.Resource, err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Always fetch the group with members attribute to ensure we get the members | |
| // regardless of authentication type (OAuth vs personal access token) | |
| group, rateLimitData, err := g.client.GetGroup(ctx, workspaceId, groupId.Resource, databricks.NewGroupAttrVars()) | |
| if err != nil { | |
| return nil, "", nil, fmt.Errorf("databricks-connector: failed to get group %s: %w", groupId.Resource, err) | |
| } | |
| // Always fetch the group with members attribute to ensure we get the members | |
| // regardless of authentication type (OAuth vs personal access token) | |
| group, rateLimitData, err := g.client.GetGroup(ctx, workspaceId, groupId.Resource, databricks.NewGroupAttrVars()) | |
| if err != nil { | |
| err = makeRetryableIfNotFound(err) | |
| return nil, "", nil, fmt.Errorf("databricks-connector: failed to get group %s: %w", groupId.Resource, err) | |
| } |
🤖 Prompt for AI Agents
In pkg/connector/groups.go around lines 199 to 204, the call to
g.client.GetGroup returns an error that is not wrapped with
makeRetryableIfNotFound like the same calls in Grant and Revoke; update the
error handling to pass the returned error through makeRetryableIfNotFound before
returning so transient "not found" errors become retryable (i.e., replace the
current fmt.Errorf return with returning makeRetryableIfNotFound(err) wrapped
appropriately to preserve the original error context).
| return nil, "", nil, fmt.Errorf("databricks-connector: failed to parse group resource id: %w", err) | ||
| } | ||
|
|
||
| groupTrait, err := rs.GetGroupTrait(resource) |
There was a problem hiding this comment.
@FeliLucero1 was the group trait missing members? or just didn't include them at all?
There was a problem hiding this comment.
Used to lose some
There was a problem hiding this comment.
It seems that the users It wasn't losing were users who had permissions over the group but weren't members of it, therefore It was losing all members
@luisina-santos I don't think this should have been a BB ticket in the first place, this is c1 platform behavior and connectors shouldn't have special logic for this |
Description
Summary by CodeRabbit