Conversation
…onditionals in the retry loops.
WalkthroughThe changes introduce a new Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Retryer
participant Provisioner
Caller->>Retryer: ShouldWaitAndRetry(ctx, err)
alt Error is retryable and attempts remain
Retryer->>Caller: Wait and return true
loop While ShouldWaitAndRetry returns true
Caller->>Provisioner: Grant/Revoke/Sync operation
Provisioner-->>Caller: Response or Error
Caller->>Retryer: ShouldWaitAndRetry(ctx, err)
end
else Error is not retryable or attempts exhausted
Retryer->>Caller: Return false
end
Possibly related PRs
Suggested reviewers
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
pkg/retry/retry.go (2)
67-74: Linear back-off without jitter can create thundering-herd issues and overflows.
- A fleet of workers will all retry at the same deterministic interval, amplifying load.
time.Duration(int64(r.attempts))*r.initialDelaycan overflow before you reachmath.MaxInt64nanoseconds on 32-bit platforms.Consider exponential back-off with ±15 % jitter and a saturating multiply to avoid overflow.
Also applies to: 99-101
105-112: Unnecessaryfor { select { … }}loop – single wait is enough.The loop executes at most once because both branches
return. Removing the loop simplifies reasoning and avoids future bugs if new cases are added.- for { - select { - case <-time.After(wait): - return true - case <-ctx.Done(): - return false - } - } + select { + case <-time.After(wait): + return true + case <-ctx.Done(): + return false + }pkg/retry/retry_test.go (1)
38-44: Tests rely on real wall-clock sleeps → slow & flaky.
ShouldWaitAndRetryblocks for ~100 ms and the assertion assumes <300 ms.
On contended CI agents this can overshoot and fail spuriously.Use a tiny
InitialDelay(e.g. 1 ms) or inject a clock interface so the test can fast-forward time deterministically.pkg/sync/syncer.go (1)
365-370:Retryeris hard-coded; make it configurable for flexibility and testing.Creating a retryer with unlimited attempts & fixed delays inside
Sync:retryer := retry.NewRetryer(ctx, retry.RetryConfig{ … })ties production and unit tests to these constants. Consider:
- Accepting a
RetryConfigorretry.RetryerviaWithRetryConfig/WithRetryeroption.- Falling back to the current defaults when the option is not provided.
This allows tighter retry budgets in tests and different back-off strategies for callers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
pkg/connectorbuilder/connectorbuilder.go(3 hunks)pkg/retry/retry.go(1 hunks)pkg/retry/retry_test.go(1 hunks)pkg/sync/syncer.go(7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
pkg/sync/syncer.go (2)
pkg/retry/retry.go (2)
NewRetryer(31-45)RetryConfig(25-29)pkg/sync/state.go (3)
SyncResourcesOp(110-110)SyncExternalResourcesOp(114-114)SyncAssetsOp(115-115)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (5)
pkg/connectorbuilder/connectorbuilder.go (5)
25-25: Clean import addition.The code correctly adds the new retry package import needed for the refactored retry functionality.
918-943: Well-structured refactoring of the Grant method.This refactoring effectively abstracts provisioner behavior for both v1 and v2 implementations using a common function type while centralizing retry logic. The creation of
grantFuncto handle the differences between provisioner versions is especially clean.The retry configuration parameters (3 max attempts, 30s initial delay, 120s max delay) appear reasonable for this use case.
948-960: Clean retry loop implementation.The retry loop has been simplified to use the centralized retry mechanism. The code now:
- Attempts the operation
- Returns on success
- Uses the retryer's ShouldWaitAndRetry method to determine whether to retry
- Properly handles and reports errors
This is more maintainable than the previous custom retry implementation.
972-990: Good reuse of retry pattern in Revoke method.This refactoring applies the same pattern used in the Grant method. The code sets up a consistent retryer configuration and abstracts over different provisioner versions, promoting code consistency between the two methods.
996-1008: Consistent error handling in Revoke method.The retry implementation follows the same pattern as in the Grant method, ensuring consistency across the codebase. The error handling, logging, and metric recording are all properly maintained during the refactoring.
| // If error contains rate limit data, use that instead | ||
| if st, ok := status.FromError(err); ok { | ||
| details := st.Details() | ||
| for _, detail := range details { | ||
| if rlData, ok := detail.(*v2.RateLimitDescription); ok { | ||
| waitResetAt := time.Until(rlData.ResetAt.AsTime()) | ||
| if waitResetAt <= 0 { | ||
| continue | ||
| } | ||
| duration := time.Duration(rlData.Limit) | ||
| if duration <= 0 { | ||
| continue | ||
| } | ||
| waitResetAt /= duration | ||
| // Round up to the nearest second to make sure we don't hit the rate limit again | ||
| waitResetAt = time.Duration(math.Ceil(waitResetAt.Seconds())) * time.Second | ||
| if waitResetAt > 0 { | ||
| wait = waitResetAt | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Rate-limit wait calculation appears incorrect.
waitResetAt /= duration divides a duration by the numeric limit, yielding an arbitrary nanosecond value.
Usually you want time.Until(reset) + smallBuffer or time.Until(reset)/remaining, not “duration ÷ limit”.
Please revisit the algorithm; otherwise the retryer may sleep far too short or far too long.
| if status.Code(err) != codes.Unavailable && status.Code(err) != codes.DeadlineExceeded { | ||
| return false | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Consider handling additional recoverable gRPC codes (e.g. ResourceExhausted, Internal).
Only Unavailable and DeadlineExceeded are currently retried. Connectors often surface transient throttling under ResourceExhausted and some load-balancers use Internal for “connection reset” conditions. You may unintentionally fail otherwise transient operations.
- if status.Code(err) != codes.Unavailable && status.Code(err) != codes.DeadlineExceeded {
+ switch status.Code(err) {
+ case codes.Unavailable, codes.DeadlineExceeded, codes.ResourceExhausted, codes.Internal:
+ // retryable
+ default:
return false
- }
+ }🏁 Script executed:
#!/bin/bash
# Locate retry.go and display the relevant retry logic context
file=$(fd --exact-name retry.go)
echo "Found file: $file"
sed -n '40,80p' "$file"Length of output: 345
🏁 Script executed:
#!/bin/bash
# Locate retry.go using find
file=$(find . -type f -name "retry.go" | head -n1)
if [ -z "$file" ]; then
echo "retry.go not found"
exit 1
fi
echo "Found file: $file"
# Show lines around the retry condition (lines 40–80)
sed -n '40,80p' "$file"Length of output: 1269
Enhance retry logic to include additional transient gRPC codes
Currently only Unavailable and DeadlineExceeded are retried. In practice you’ll also see transient throttling surfaced as ResourceExhausted and connection‐reset errors as Internal. Without including these, short‐lived failures will fall through and not be retried.
Suggested change in pkg/retry/retry.go around line 55:
- if status.Code(err) != codes.Unavailable && status.Code(err) != codes.DeadlineExceeded {
- return false
- }
+ switch status.Code(err) {
+ case codes.Unavailable,
+ codes.DeadlineExceeded,
+ codes.ResourceExhausted,
+ codes.Internal:
+ // retryable
+ default:
+ return false
+ }This ensures you’ll back off and retry on throttling (ResourceExhausted) and connection resets (Internal) as well.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
pkg/retry/retry.go (3)
59-65: Off-by-one semantics make MaxAttempts act as "max retries − 1".
r.attempts++happens before the check, so aMaxAttemptsof 3 allows only 2 retries.
If the intent is "retry at most N times", move the increment after the guard (or change the comparison to>).
55-57: Enhance retry logic to include additional transient gRPC codesCurrently only
UnavailableandDeadlineExceededare retried. In practice you'll also see transient throttling surfaced asResourceExhaustedand connection‐reset errors asInternal. Without including these, short‐lived failures will fall through and not be retried.
75-96: Rate-limit wait calculation appears incorrect.
waitResetAt /= durationdivides a duration by the numeric limit, yielding an arbitrary nanosecond value.
Usually you wanttime.Until(reset) + smallBufferortime.Until(reset)/remaining, not "duration ÷ limit".Please revisit the algorithm; otherwise the retryer may sleep far too short or far too long.
🧹 Nitpick comments (3)
pkg/retry/retry.go (3)
25-29: Documentation inconsistency with implementation for MaxDelay.The comment states "0 means no limit" for MaxDelay, but the implementation at line 42 sets a default of 60 seconds if it's 0. Either update the comment or change the implementation to match the documented behavior.
- MaxDelay time.Duration // Default is 60 seconds. 0 means no limit. + MaxDelay time.Duration // Default is 60 seconds. 0 will be set to default.
31-45: Context parameter is unused in NewRetryer.The function accepts a context parameter but doesn't use it anywhere within the function. Consider removing it to avoid confusion.
-func NewRetryer(ctx context.Context, config RetryConfig) *Retryer { +func NewRetryer(config RetryConfig) *Retryer {
67-73: Handling overflow check is unnecessary.The check for
r.attempts > math.MaxInt64is incredibly unlikely to be true in practice, as it would require billions of years of constant retrying at reasonable intervals. Consider simplifying this.- if r.attempts > math.MaxInt64 { - wait = r.maxDelay - } else { - wait = time.Duration(int64(r.attempts)) * r.initialDelay - } + wait = time.Duration(int64(r.attempts)) * r.initialDelay
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/retry/retry.go(1 hunks)pkg/retry/retry_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/retry/retry_test.go
🔇 Additional comments (1)
pkg/retry/retry.go (1)
103-112: LGTM! Good implementation of cancellable delay.The wait loop pattern correctly handles both waiting for the specified duration and respecting context cancellation, which is important for clean shutdown behavior.
| annos, err := provisioner.Grant(ctx, request.Principal, request.Entitlement) | ||
| if err != nil { | ||
| l.Error("error: grant failed", zap.Error(err)) | ||
| if !b.shouldWaitAndRetry(ctx, err, baseDelay) || attempt >= 2 { |
There was a problem hiding this comment.
this change I think increased the attempt count fwiw
There was a problem hiding this comment.
Yes, the old code tried twice before erroring. I changed it to three because that seems like a good number of attempts before giving up.
|
|
||
| // use linear backoff by default | ||
| var wait time.Duration | ||
| if r.attempts > math.MaxInt64 { |
There was a problem hiding this comment.
It was to get the linter to shut up.
| provisionerV2, v2ok := b.resourceProvisionersV2[rt] | ||
| if !v1ok && !v2ok { | ||
| retryer := retry.NewRetryer(ctx, retry.RetryConfig{ | ||
| MaxAttempts: 3, |
There was a problem hiding this comment.
There was a problem hiding this comment.
Three retries would wait 30, 60, then 90 seconds, which is 3 minutes total.
There was a problem hiding this comment.
Three minutes seems to be the ideal timing. I haven’t looked into the code to see how C1 tickets change status, but based on my testing, three attempts don’t work — the C1 ticket seems to be stuck.


Syncing, granting, and revoking all have retry logic, and they're all pretty similar. This unifies them into a single retryer package.
Summary by CodeRabbit
New Features
Refactor
Tests