-
Notifications
You must be signed in to change notification settings - Fork 0
AuthZService: improve authz caching #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cache-optimization-baseline
Are you sure you want to change the base?
AuthZService: improve authz caching #9
Conversation
* remove the use of client side cache for in-proc authz client Co-authored-by: Gabriel MABILLE <[email protected]> * add a permission denial cache, fetch perms if not in either of the caches Co-authored-by: Gabriel MABILLE <[email protected]> * Clean up tests Co-authored-by: Ieva <[email protected]> * Cache tests Co-authored-by: Ieva <[email protected]> * Add test to list + cache Co-authored-by: Ieva <[email protected]> * Add outdated cache test Co-authored-by: Ieva <[email protected]> * Re-organize metrics Co-authored-by: Ieva <[email protected]> --------- Co-authored-by: Gabriel MABILLE <[email protected]>
|
@coderabbitai review |
|
@coderabbitai review |
1 similar comment
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe pull request introduces permission denial caching to the RBAC authorization service. It replaces in-process RBAC client construction with a remote client using NoopCache, adds a utility function for generating permission denial cache keys, and integrates caching logic into authorization decision paths with early-deny mechanisms and cache-first lookups in Check and List methods. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Check as Check Method
participant PermCache as permDenialCache
participant Permissions as Full Permission<br/>Evaluation
Client->>Check: Check(ctx, identity, action...)
rect rgb(200, 220, 255)
Note over Check,PermCache: Early Deny Path
Check->>PermCache: Query denial cache
opt Cache Hit (Denied)
PermCache-->>Check: Return true (denied)
Check-->>Client: Deny (fast path)
end
end
rect rgb(220, 255, 220)
Note over Check,Permissions: Cached Lookup Path
Check->>PermCache: getCachedIdentityPermissions
opt Cache Hit (Permissions Exist)
PermCache-->>Check: Return cached permissions map
Check-->>Client: Authorize/Deny (from cache)
end
end
rect rgb(255, 240, 200)
Note over Check,Permissions: Cache Miss - Full Evaluation
Check->>Permissions: Full permission evaluation
Permissions-->>Check: Return permissions
Check->>PermCache: Store in cache
PermCache-->>Check: ACK
Check-->>Client: Authorize/Deny
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/services/authz/rbac/service.go (1)
93-122: Scope denial caching to user-like identities to avoid cross-identity bleedThe early
permDenialCachecheck inCheckis a good optimization, but as implemented it usesuserPermDenialCacheKeyfor all identity types and only keys on namespace,UserUID, action, name, and parent folder. BecausevalidateSubjectdrops the identity type when parsing the subject (e.g. bothanonymous:0anduser:0yielduserUID == "0"), a denial cached for one identity type can be incorrectly reused for another that shares the same UID string.That can lead to unexpected hard denies for different principals until the short TTL expires (e.g., a user with UID
"0"inheriting a denial cached for the anonymous subject"anonymous:0").A low‑risk fix is to only use the denial cache for user/service‑account identities:
func (s *Service) Check(ctx context.Context, req *authzv1.CheckRequest) (*authzv1.CheckResponse, error) { @@ - span.SetAttributes( + span.SetAttributes( @@ - permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder) - if _, ok := s.permDenialCache.Get(ctx, permDenialKey); ok { - s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc() - s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() - return &authzv1.CheckResponse{Allowed: false}, nil - } + usePermDenialCache := types.IsIdentityType(checkReq.IdentityType, types.TypeUser, types.TypeServiceAccount) + if usePermDenialCache { + permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder) + if _, ok := s.permDenialCache.Get(ctx, permDenialKey); ok { + s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc() + s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() + return &authzv1.CheckResponse{Allowed: false}, nil + } + } @@ - if !allowed { - s.permDenialCache.Set(ctx, permDenialKey, true) - } + if !allowed && usePermDenialCache { + permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder) + s.permDenialCache.Set(ctx, permDenialKey, true) + }Alternatively, you could include the identity type in the denial cache key, but gating by type keeps the function name (
userPermDenialCacheKey) and existing key structure intact.Also applies to: 153-156, 56-61, 83-88
🧹 Nitpick comments (3)
pkg/services/authz/rbac/cache.go (1)
30-32: Denial cache key helper is consistent; consider a more explicit prefixKey composition matches the existing
userPermCacheKey, but because this is semantically different (explicit denials), using a distinct prefix (e.g..perm_denial_…) would make debugging/log inspection easier and future‑proofs you if these keys ever end up in a shared cache.pkg/services/authz/rbac/service_test.go (1)
1291-1323: StrengthenTestService_CacheListby asserting no store callsThis test correctly verifies that
Listcan be served entirely frompermCache, but it doesn’t assert that the backing store wasn’t hit. You can increase confidence in the cache‑only behavior by asserting the fake store’s call count stays at zero:t.Run("List based on cached permissions", func(t *testing.T) { s := setupService() ctx := types.WithAuthInfo(context.Background(), callingService) userID := &store.UserIdentifiers{UID: "test-uid", ID: 1} @@ - s.identityStore = &fakeIdentityStore{} + s.identityStore = &fakeIdentityStore{} + + store, ok := s.store.(*fakeStore) + require.True(t, ok) @@ require.NoError(t, err) require.ElementsMatch(t, resp.Items, []string{"dash1", "dash2"}) require.ElementsMatch(t, resp.Folders, []string{"fold1"}) + + require.Zero(t, store.calls) })This keeps the test focused on verifying that a warm cache prevents any backend reads.
pkg/services/authz/rbac/service.go (1)
161-197: Differentiate cache miss from backend errors ingetCachedIdentityPermissionscallers
List(and similarlyCheck) currently treats any error fromgetCachedIdentityPermissionsas a simple cache miss:permissions, err := s.getCachedIdentityPermissions(...) if err == nil { // use cache } else { // treat as miss, hit DB permissions, err = s.getIdentityPermissions(...) }But
getCachedIdentityPermissionscan fail both withcache.ErrNotFound(true miss) and with real backend errors (e.g., fromGetUserIdentifiers). In the latter case you’ll end up executing the same expensive chain (GetUserIdentifiers,getUserBasicRole, etc.) twice: once in the cache path, then again viagetIdentityPermissions.It would be cleaner and cheaper under failure to distinguish these:
permissions, err := s.getCachedIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action) -if err == nil { - s.metrics.permissionCacheUsage.WithLabelValues("true", listReq.Action).Inc() -} else { - s.metrics.permissionCacheUsage.WithLabelValues("false", listReq.Action).Inc() - - permissions, err = s.getIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action) - if err != nil { - // ... - return nil, err - } -} +if err == nil { + s.metrics.permissionCacheUsage.WithLabelValues("true", listReq.Action).Inc() +} else { + if !errors.Is(err, cache.ErrNotFound) { + // propagate real errors instead of retrying the whole path + ctxLogger.Error("could not get cached user permissions", "subject", req.GetSubject(), "error", err) + s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() + return nil, err + } + s.metrics.permissionCacheUsage.WithLabelValues("false", listReq.Action).Inc() + + permissions, err = s.getIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action) + if err != nil { + // ... + return nil, err + } +}(Analogous handling could be applied in
Check.)This keeps the “cache miss => fall back to DB” behavior while avoiding duplicate backend work and clearer logging when the cache lookup itself fails for non‑miss reasons.
Also applies to: 342-368
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
pkg/services/authz/rbac.go(3 hunks)pkg/services/authz/rbac/cache.go(1 hunks)pkg/services/authz/rbac/service.go(6 hunks)pkg/services/authz/rbac/service_test.go(3 hunks)
🔇 Additional comments (2)
pkg/services/authz/rbac/service_test.go (1)
1325-1345: Setup wiring forpermDenialCachelooks good
setupServicenow initializespermDenialCachewithshortCacheTTL, matching the runtimeServiceconfiguration and keeping denial entries short‑lived to reduce the risk of stale negative decisions. No issues here.pkg/services/authz/rbac.go (1)
101-105: Client-side cache configuration andNoopCacheimplementation are reasonableUsing a
NoopCachefor the in‑proc client avoids redundant caching on top of the server‑side caches, while the remote client now benefits from a small local cache (Expiry: 30s) to reduce RPC volume. TheNoopCacheimplementation correctly satisfies thecache.Cacheinterface by always returningcache.ErrNotFoundfromGetandnilfromSet/Delete, so existing client code should see it as a normal cache miss.If you find yourself needing to tune cache behavior later, consider threading a
cache.Cache(or config) intonewRemoteRBACClientinstead of hard‑coding theLocalCacheparameters, but the current setup is fine for now.Please double‑check against the current
authzlib.WithCacheClientOptionandcache.Cacheinterfaces that there are no additional behavioral expectations beyond returningErrNotFoundfor a miss, especially aroundDelete.Also applies to: 160-170, 239-251
| func TestService_CacheCheck(t *testing.T) { | ||
| callingService := authn.NewAccessTokenAuthInfo(authn.Claims[authn.AccessTokenClaims]{ | ||
| Claims: jwt.Claims{ | ||
| Subject: types.NewTypeID(types.TypeAccessPolicy, "some-service"), | ||
| Audience: []string{"authzservice"}, | ||
| }, | ||
| Rest: authn.AccessTokenClaims{Namespace: "org-12"}, | ||
| }) | ||
|
|
||
| ctx := types.WithAuthInfo(context.Background(), callingService) | ||
| userID := &store.UserIdentifiers{UID: "test-uid", ID: 1} | ||
|
|
||
| t.Run("Allow based on cached permissions", func(t *testing.T) { | ||
| s := setupService() | ||
|
|
||
| s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID) | ||
| s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true}) | ||
|
|
||
| resp, err := s.Check(ctx, &authzv1.CheckRequest{ | ||
| Namespace: "org-12", | ||
| Subject: "user:test-uid", | ||
| Group: "dashboard.grafana.app", | ||
| Resource: "dashboards", | ||
| Verb: "get", | ||
| Name: "dash1", | ||
| }) | ||
| require.NoError(t, err) | ||
| assert.True(t, resp.Allowed) | ||
| }) | ||
| t.Run("Fallback to the database on cache miss", func(t *testing.T) { | ||
| s := setupService() | ||
|
|
||
| // Populate database permission but not the cache | ||
| store := &fakeStore{ | ||
| userID: userID, | ||
| userPermissions: []accesscontrol.Permission{{Action: "dashboards:read", Scope: "dashboards:uid:dash2"}}, | ||
| } | ||
|
|
||
| s.store = store | ||
| s.permissionStore = store | ||
|
|
||
| s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID) | ||
|
|
||
| resp, err := s.Check(ctx, &authzv1.CheckRequest{ | ||
| Namespace: "org-12", | ||
| Subject: "user:test-uid", | ||
| Group: "dashboard.grafana.app", | ||
| Resource: "dashboards", | ||
| Verb: "get", | ||
| Name: "dash2", | ||
| }) | ||
| require.NoError(t, err) | ||
| assert.True(t, resp.Allowed) | ||
| }) | ||
| t.Run("Fallback to the database on outdated cache", func(t *testing.T) { | ||
| s := setupService() | ||
|
|
||
| store := &fakeStore{ | ||
| userID: userID, | ||
| userPermissions: []accesscontrol.Permission{{Action: "dashboards:read", Scope: "dashboards:uid:dash2"}}, | ||
| } | ||
|
|
||
| s.store = store | ||
| s.permissionStore = store | ||
|
|
||
| s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID) | ||
| // The cache does not have the permission for dash2 (outdated) | ||
| s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true}) | ||
|
|
||
| resp, err := s.Check(ctx, &authzv1.CheckRequest{ | ||
| Namespace: "org-12", | ||
| Subject: "user:test-uid", | ||
| Group: "dashboard.grafana.app", | ||
| Resource: "dashboards", | ||
| Verb: "get", | ||
| Name: "dash2", | ||
| }) | ||
| require.NoError(t, err) | ||
| assert.True(t, resp.Allowed) | ||
| }) | ||
| t.Run("Should deny on explicit cache deny entry", func(t *testing.T) { | ||
| s := setupService() | ||
|
|
||
| s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID) | ||
|
|
||
| // Explicitly deny access to the dashboard | ||
| s.permDenialCache.Set(ctx, userPermDenialCacheKey("org-12", "test-uid", "dashboards:read", "dash1", "fold1"), true) | ||
|
|
||
| // Allow access to the dashboard to prove this is not checked | ||
| s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false}) | ||
|
|
||
| resp, err := s.Check(ctx, &authzv1.CheckRequest{ | ||
| Namespace: "org-12", | ||
| Subject: "user:test-uid", | ||
| Group: "dashboard.grafana.app", | ||
| Resource: "dashboards", | ||
| Verb: "get", | ||
| Name: "dash1", | ||
| Folder: "fold1", | ||
| }) | ||
| require.NoError(t, err) | ||
| assert.False(t, resp.Allowed) | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestService_CacheCheck denial test comment doesn’t match data
In the “Should deny on explicit cache deny entry” subtest, the comment says the perm cache “allows access”, but the map stores false for the scope, so it actually represents a denial, and the value is irrelevant because the denial cache short‑circuits first. If you want this test to prove that permDenialCache overrides an allowing permission cache, flip the value to true:
- // Allow access to the dashboard to prove this is not checked
- s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false})
+ // Allow access to the dashboard to prove this is not checked
+ s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true})That way the test clearly verifies that an explicit deny entry wins even when the positive permission cache would otherwise allow.
🤖 Prompt for AI Agents
In pkg/services/authz/rbac/service_test.go around lines 893 to 996, the "Should
deny on explicit cache deny entry" subtest's comment says the perm cache "allows
access" but the test data sets the scope value to false (a deny), which
contradicts the intent; change the cached permission map value for
"dashboards:uid:dash1" from false to true so the permCache represents an
allowing permission and the test then verifies that permDenialCache still
overrides an allow.
Test 2
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/grafana-coderabbit#2