-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Separate backoff strategies for peer registration vs login operations #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Separate backoff strategies for peer registration vs login operations #4835
Conversation
📝 WalkthroughWalkthroughAdds grouped retry/backoff constants and context-aware backoff factories; classifies LoginRequest as registration when Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant GRPC as management gRPC client
participant Detector as isRegistrationRequest
participant Selector as Backoff selector
participant Backoff as Backoff engine
Caller->>GRPC: Login(req)
GRPC->>Detector: isRegistrationRequest(req)?
alt registration
Detector-->>Selector: registration
Selector->>Backoff: newRegistrationBackoff(ctx)
else login
Detector-->>Selector: login
Selector->>Backoff: newLoginBackoff(ctx)
end
Backoff->>GRPC: invoke Login call with retries
alt error == Canceled
Note right of GRPC: log Canceled, return permanent error
else error == DeadlineExceeded
Note right of GRPC: debug log, allow retry
else other errors
Note right of GRPC: treat as permanent error
end
GRPC-->>Caller: response or final error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR separates backoff strategies for peer registration and login operations by introducing distinct timeout configurations. Previously, both operations used a single 10-second backoff timeout, but registration operations (involving database writes and IdP validation) require longer timeouts than login operations.
Key changes:
- Auto-detects operation type by checking for SetupKey or JwtToken in the request
- Implements separate backoff configurations: 180s timeout for registration (5s initial, 30s max interval) vs 45s for login (3s initial, 15s max interval)
- Adds helper functions for creating backoff strategies while maintaining backward compatibility
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if s.Code() == codes.DeadlineExceeded { | ||
| log.Debugf("login timeout, retrying...") | ||
| return err | ||
| } |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message 'login timeout, retrying...' doesn't distinguish between registration and login operations. Since these now have different timeout configurations, the message should indicate which operation timed out to aid debugging. Consider using a message like 'operation timeout, retrying...' or including the operation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
shared/management/client/grpc.go (3)
325-349: Consider usingNewExponentialBackOff+ opts instead of struct literalsThe two helpers mirror
defaultBackoffby instantiatingExponentialBackOffvia struct literal and passing it directly toWithContext. The backoff/v4 docs state thatResetshould be called before using anExponentialBackOff; usingNewExponentialBackOffwithWithInitialInterval/WithMaxInterval/WithMaxElapsedTime(or explicitly callingReset()once after construction) better aligns with that contract and is more robust to internal field changes. (pkg.go.dev)You could keep the same constants but build the policies like this:
-func newRegistrationBackoff(ctx context.Context) backoff.BackOff { - return backoff.WithContext(&backoff.ExponentialBackOff{ - InitialInterval: RegistrationRetryInterval, - RandomizationFactor: backoff.DefaultRandomizationFactor, - Multiplier: backoff.DefaultMultiplier, - MaxInterval: RegistrationRetryMaxDelay, - MaxElapsedTime: RegistrationRetryTimeout, - Stop: backoff.Stop, - Clock: backoff.SystemClock, - }, ctx) -} +func newRegistrationBackoff(ctx context.Context) backoff.BackOff { + b := backoff.NewExponentialBackOff( + backoff.WithInitialInterval(RegistrationRetryInterval), + backoff.WithMaxInterval(RegistrationRetryMaxDelay), + backoff.WithMaxElapsedTime(RegistrationRetryTimeout), + ) + return backoff.WithContext(b, ctx) +} @@ -func newLoginBackoff(ctx context.Context) backoff.BackOff { - return backoff.WithContext(&backoff.ExponentialBackOff{ - InitialInterval: LoginRetryInterval, - RandomizationFactor: backoff.DefaultRandomizationFactor, - Multiplier: backoff.DefaultMultiplier, - MaxInterval: LoginRetryMaxDelay, - MaxElapsedTime: LoginRetryTimeout, - Stop: backoff.Stop, - Clock: backoff.SystemClock, - }, ctx) -} +func newLoginBackoff(ctx context.Context) backoff.BackOff { + b := backoff.NewExponentialBackOff( + backoff.WithInitialInterval(LoginRetryInterval), + backoff.WithMaxInterval(LoginRetryMaxDelay), + backoff.WithMaxElapsedTime(LoginRetryTimeout), + ) + return backoff.WithContext(b, ctx) +}This keeps the behavior but relies on the library’s initialization path rather than on its struct layout.
351-355: Registration detection viaSetupKey/JwtTokenis clear but becomes the contractUsing
SetupKeyorJwtTokenpresence to classify aLoginRequestas a registration matches howRegistervsLoginpopulate the request, so the split backoff will behave as expected for current call sites. Just be aware this effectively codifies “any flow that sets eitherSetupKeyorJwtTokenis treated as registration” — future callers that happen to populate those fields will automatically inherit the longer registration retry window.
379-386: Clarify desired retry behavior forCanceledand use of parent context inloginInside
operation, bothcodes.Canceledandcodes.DeadlineExceededare returned as plain errors, sobackoff.Retrywill keep retrying them until the chosen backoff (registration or login) stops orc.ctxis canceled; all other status codes and non‑gRPC errors become permanent viabackoff.Permanent(err). If the intent is forCanceled(e.g., user‑driven cancellation) to stop retries, you may want to wrap it withbackoff.Permanent(err)as well or gate onc.ctxexplicitly before retrying.Also,
mgmCtxis still derived fromcontext.Background(), so cancelingc.ctxstops scheduling new retries but does not abort an in‑flightLogincall until its ownConnectTimeoutelapses. If you want immediate cancellation behavior on shutdown, consider usingc.ctxas the parent here:- mgmCtx, cancel := context.WithTimeout(context.Background(), ConnectTimeout) + mgmCtx, cancel := context.WithTimeout(c.ctx, ConnectTimeout)That would tie per‑RPC cancellation to the client’s lifecycle while still respecting the new backoff windows.
Also applies to: 394-402
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
28-38: Timeout split between registration and login looks reasonable; verify overall UXThe separate retry windows (180s for registration, 45s for login) align with the stated goals of “slow to give up on registration, fast to fail login”. Just double‑check in practice that, combined with the 10s per‑RPC
ConnectTimeoutand exponential backoff, the worst‑case wall‑clock behavior still fits CLI/daemon expectations (e.g., not surprising users with overly long waits on repeated failures).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
379-402: LGTM! Correct error handling and backoff selection logic.The enhanced error handling properly distinguishes between retryable errors (Canceled, DeadlineExceeded) and permanent errors. The conditional backoff strategy selection based on request type is correctly implemented.
The context usage pattern is well-designed: using
c.ctxfor the backoff respects the client's lifecycle, while usingcontext.Background()for individual retry attempts ensures each gets a fresh timeout.Optional improvement: The log message at line 384 could indicate whether it's a registration or login timeout to aid debugging, as noted in the previous review. However, this is a minor observability enhancement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Client / Unit
- GitHub Check: Android / Build
- GitHub Check: Linux
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: JS / Lint
🔇 Additional comments (4)
shared/management/client/grpc.go (4)
28-38: LGTM! Well-organized backoff constants.The separate backoff strategies are appropriate: registration operations get more generous timeouts (180s total, 5s initial, 30s max) compared to login operations (45s total, 3s initial, 15s max). The grouping in a single const block improves readability.
325-336: LGTM! Correct backoff implementation.The registration backoff helper is properly configured with exponential backoff parameters and context integration for cancellation support.
338-349: LGTM! Correct backoff implementation.The login backoff helper is properly configured with exponential backoff parameters and context integration for cancellation support.
351-355: LGTM! Detection logic is correct.The function correctly identifies registration requests by checking for either SetupKey or JwtToken presence. The comment clarification from the previous review has been addressed appropriately.
4b04d0a to
f50db3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
shared/management/client/grpc.go (1)
383-386: Enhance log message to distinguish operation type.Since registration and login now have different timeout configurations (180s vs 45s), the debug message should indicate which operation timed out to aid troubleshooting.
Apply this diff:
+ operationType := "login" + if isRegistrationRequest(req) { + operationType = "registration" + } if s.Code() == codes.DeadlineExceeded { - log.Debugf("login timeout, retrying...") + log.Debugf("%s timeout, retrying...", operationType) return err }
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
28-38: Consider adding documentation for retry constants.While the constant names are self-explanatory, brief comments explaining the rationale for these specific timeout values would improve maintainability. For example, why registration needs 180s vs login's 45s (database writes, external IdP validation, etc.).
Example:
const ( ConnectTimeout = 10 * time.Second + // Registration retry constants - longer timeouts to accommodate database writes, + // external IdP validation, and group calculations RegistrationRetryTimeout = 180 * time.Second RegistrationRetryInterval = 5 * time.Second RegistrationRetryMaxDelay = 30 * time.Second + // Login retry constants - shorter timeouts for faster failure feedback LoginRetryTimeout = 45 * time.Second LoginRetryInterval = 3 * time.Second LoginRetryMaxDelay = 15 * time.Second )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Signal / Unit (386)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
394-401: Excellent dynamic backoff strategy selection.The conditional logic cleanly separates registration from login backoff strategies based on request introspection, avoiding the need for additional boolean parameters. This maintains backward compatibility while achieving the PR's objectives.
4799eef to
a005204
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
shared/management/client/grpc.go (1)
384-386: Clarify timeout log message to distinguish registration vs loginThe log text
"login timeout, retrying..."can be misleading whenlogin()is invoked for registration flows as well. Consider either:
- Using a more generic message like
"operation timeout, retrying...", or- Including the derived operation type, e.g.,
"registration timeout, retrying..."vs"login timeout, retrying...".This will make logs easier to interpret now that registration and login use different backoff profiles.
🧹 Nitpick comments (4)
shared/management/client/grpc.go (4)
28-38: Retry constants look good; consider keeping them internal if not part of the public APIThe registration/login retry constants match the intent (longer window for registration, shorter for login) and are clearly named. If they’re not meant to be part of the package’s external API, consider making them unexported (
registrationRetryTimeout, etc.) to avoid committing to them as public knobs.
325-349: Backoff helpers correctly wrap ExponentialBackOff; be aware MaxElapsedTime caps only the backoff windowThe
newRegistrationBackoff/newLoginBackoffhelpers correctly constructExponentialBackOffpolicies and wrap them withbackoff.WithContext, andbackoff.Retrywill callReset()before use, so the instances are safe to use as written. (pkg.go.dev)Note that
MaxElapsedTimelimits the total elapsed time within the backoff policy, while each attempt still runs under the separateConnectTimeoutused insideoperation. If you need a strict overall wall-clock cap (including RPC timeouts), you may want to tightenMaxElapsedTimeor derive the per-RPC context from a higher-level deadline.
379-390: Only DeadlineExceeded is retried; double‑check that’s the intended retry policyIn the Login RPC, only
codes.DeadlineExceededis treated as retryable;codes.Canceledand all other status codes are turned intobackoff.Permanent(err), so they will not be retried.If the server or network can transiently return other codes like
codes.Unavailableorcodes.ResourceExhausted, you may want to explicitly allowlist those as retryable too, while keeping clearly permanent codes (e.g.,InvalidArgument,Unauthenticated,PermissionDenied) as non-retryable.A possible pattern:
if s, ok := gstatus.FromError(err); ok { switch s.Code() { case codes.Canceled: // permanent return backoff.Permanent(err) case codes.DeadlineExceeded, codes.Unavailable, codes.ResourceExhausted: // retryable log.Debugf("%s timeout/unavailable, retrying...", opType) return err default: return backoff.Permanent(err) } }
395-403: Backoff selection based on isRegistrationRequest looks correct and is context‑awareChoosing between
newRegistrationBackoff(c.ctx)andnewLoginBackoff(c.ctx)based onisRegistrationRequest(req)aligns with the goal of using different retry windows for registration vs login and ties the retry loop to the client’s lifecycle viac.ctx.One minor consideration:
operationcurrently usescontext.Background()as the parent formgmCtx, so cancellation ofc.ctxwill stop future retries but not an in-flight RPC; that RPC will still run untilConnectTimeoutexpires. If you need faster shutdown, consider derivingmgmCtxfromc.ctxinstead ofcontext.Background().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: Client / Unit
- GitHub Check: JS / Lint
- GitHub Check: Build Cache
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
351-355: I'm unable to clone the repository to verify the review comment. The cloning operation is failing consistently, which prevents me from:
- Examining how
LoginRequestis constructed in different flows (registration vs. login)- Finding where
isRegistrationRequest()is called and what the actual backoff behavior implications are- Checking for related documentation, tests, or comments that clarify the design intent
- Verifying whether JWT token-based flows are genuinely always meant to be treated as "registration"
Given that I cannot access the codebase to verify the caller patterns and the actual backoff profile implications, I must mark this review comment as requiring manual verification.
The review comment raises valid architectural concerns about the semantics of the
isRegistrationRequest()condition, particularly whether treating allJwtToken-based requests as registration is intentional or could cause issues if future "true login with JwtToken only" flows are added. However, without codebase access, I cannot confirm whether these concerns are substantiated by actual usage patterns.
The previous implementation used a single 10 second backoff for both registration and login operations. However, these are fundamentally different operations with different timeout requirements: Registration (new peer creation) legitimately needs longer timeouts because it involves database writes, external IdP validation, and group membership calculations. Login (existing peer authentication) is faster and should fail quickly to provide actionable feedback to users. This change: - Auto-detects operation type by checking for SetupKey or JwtToken in the request - Uses 180s timeout for registration (5s initial, 30s max interval) - Uses 45s timeout for login (3s initial, 15s max interval) - Extracts backoff creation into helper functions to keep login() complexity low - Maintains backward compatibility - no API changes to public methods The auto-detection approach eliminates the need for boolean parameters and keeps the login() function simple while providing appropriate timeouts for each scenario.
a005204 to
9a1905a
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
379-388: Consider distinguishing operation type in log messages.The error handling logic is correct, but the log messages (lines 381, 385) always say "login" even when the operation might be a registration. Since these operations now have different timeout configurations, more specific messages could aid debugging.
Consider applying this diff:
+ isRegistration := isRegistrationRequest(req) + operationType := "login" + if isRegistration { + operationType = "registration" + } if err != nil { if s, ok := gstatus.FromError(err); ok { if s.Code() == codes.Canceled { - log.Debugf("login canceled") + log.Debugf("%s canceled", operationType) return backoff.Permanent(err) } if s.Code() == codes.DeadlineExceeded { - log.Debugf("login timeout, retrying...") + log.Debugf("%s timeout, retrying...", operationType) return err } }Note: This would require moving the
isRegistrationRequest(req)call earlier in the operation function.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (3)
client/proto/daemon.pb.go (3)
LoginRequest(364-422)LoginRequest(435-435)LoginRequest(450-452)shared/management/status/error.go (1)
FromError(78-87)util/retry.go (1)
Retry(10-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: JS / Lint
🔇 Additional comments (5)
shared/management/client/grpc.go (5)
28-38: LGTM! Well-organized retry constants.The constants are logically grouped and the values are appropriate for their respective operations. Registration's longer timeout (180s) accommodates heavier operations like database writes and IdP validation, while login's shorter timeout (45s) provides faster feedback.
325-336: LGTM! Backoff factory correctly configured.The registration backoff factory properly integrates context cancellation and uses appropriate exponential backoff parameters for registration operations.
338-349: LGTM! Backoff factory correctly configured.The login backoff factory properly integrates context cancellation and uses appropriate exponential backoff parameters for login operations.
351-355: LGTM! Logic correctly identifies registration requests.The OR logic properly distinguishes registration (which uses SetupKey and/or JwtToken) from login (which uses neither). The function comment accurately describes the behavior.
395-402: LGTM! Backoff strategy selection correctly implemented.The conditional logic properly selects registration or login backoff based on the request type. The implementation is clear and maintains the single responsibility of the login method.
Fix issue where devices were being registered multiple times when SSO login was required. The client was attempting registration even when no setup key or JWT token was provided, causing invalid registration requests with empty credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @client/internal/login.go:
- Around line 74-78: The code currently logs at debug and returns the raw
PermissionDenied error when both setupKey and jwtToken are empty; change this to
log a higher-severity message (e.g., log.Warnf or log.Infof) that clearly states
peer registration was skipped due to missing credentials and wrap the returned
error with contextual text (e.g., "registration skipped: missing credentials")
so callers can distinguish a skipped-registration case from other
PermissionDenied errors; update the branch around setupKey, jwtToken, doMgmLogin
to log the new message and return a wrapped error (using fmt.Errorf or
errors.Wrap) that includes the original err.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
client/internal/login.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Client / Unit
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
- GitHub Check: release_ui
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Darwin
- GitHub Check: Client / Unit
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: Build Cache
- GitHub Check: Client / Unit
| // Only attempt registration if we have credentials (setup key or JWT token) | ||
| if setupKey == "" && jwtToken == "" { | ||
| log.Debugf("peer registration required but no credentials provided") | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider wrapping the error to indicate registration was skipped.
Returning the original PermissionDenied error from doMgmLogin doesn't convey that registration was skipped due to missing credentials. This may complicate troubleshooting when callers or operators try to understand why the operation failed.
Additionally, the debug log level might be insufficient for tracking this control flow decision in production environments.
🔧 Suggested improvements
Option 1: Wrap the error with context
// Only attempt registration if we have credentials (setup key or JWT token)
if setupKey == "" && jwtToken == "" {
- log.Debugf("peer registration required but no credentials provided")
- return err
+ log.Warnf("peer registration required but no credentials provided")
+ return status.Errorf(codes.FailedPrecondition, "peer registration required but no credentials (setup key or JWT token) provided: %v", err)
}Option 2: Keep original error but improve logging
// Only attempt registration if we have credentials (setup key or JWT token)
if setupKey == "" && jwtToken == "" {
- log.Debugf("peer registration required but no credentials provided")
+ log.Warnf("peer registration required but no credentials provided, returning original error: %v", err)
return err
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Only attempt registration if we have credentials (setup key or JWT token) | |
| if setupKey == "" && jwtToken == "" { | |
| log.Debugf("peer registration required but no credentials provided") | |
| return err | |
| } | |
| // Only attempt registration if we have credentials (setup key or JWT token) | |
| if setupKey == "" && jwtToken == "" { | |
| log.Warnf("peer registration required but no credentials provided") | |
| return status.Errorf(codes.FailedPrecondition, "peer registration required but no credentials (setup key or JWT token) provided: %v", err) | |
| } |
| // Only attempt registration if we have credentials (setup key or JWT token) | |
| if setupKey == "" && jwtToken == "" { | |
| log.Debugf("peer registration required but no credentials provided") | |
| return err | |
| } | |
| // Only attempt registration if we have credentials (setup key or JWT token) | |
| if setupKey == "" && jwtToken == "" { | |
| log.Warnf("peer registration required but no credentials provided, returning original error: %v", err) | |
| return err | |
| } |
🤖 Prompt for AI Agents
In @client/internal/login.go around lines 74 - 78, The code currently logs at
debug and returns the raw PermissionDenied error when both setupKey and jwtToken
are empty; change this to log a higher-severity message (e.g., log.Warnf or
log.Infof) that clearly states peer registration was skipped due to missing
credentials and wrap the returned error with contextual text (e.g.,
"registration skipped: missing credentials") so callers can distinguish a
skipped-registration case from other PermissionDenied errors; update the branch
around setupKey, jwtToken, doMgmLogin to log the new message and return a
wrapped error (using fmt.Errorf or errors.Wrap) that includes the original err.
Fix duplicate device registration issue caused by timeout retries. When a registration request times out but succeeds on the server side, subsequent retries would receive "peer has been already registered" error. This error was treated as a failure, even though registration actually succeeded. The fix: - Detects "peer already registered" error during registration retries - Automatically switches to a regular login request to fetch peer config - Returns success instead of treating it as a permanent failure This resolves the issue where devices appeared multiple times on the management endpoint after SSO login with slow registration operations.
Fix critical issue where DeadlineExceeded during registration would retry and create multiple devices. The problem: 1. Client sends registration request with 10s gRPC timeout 2. Server processing takes longer (DB writes, group calc, IdP validation) 3. Client times out and retries due to backoff strategy 4. Multiple parallel requests each create a new device on the server The root cause was treating DeadlineExceeded as a retriable error for registration. Unlike login (which is idempotent), registration creates new resources and MUST NOT be retried when timing out. This reverts the DeadlineExceeded retry behavior for registration while keeping it for login operations. Fixes the issue where multiple devices appeared on the management endpoint after SSO login attempts.
Implement transparent recovery mechanism to hide registration timeout errors from users when server-side processing takes longer than client timeout. Flow when registration times out: 1. Log informational message about waiting for server 2. Wait 60 seconds for server to complete registration 3. Attempt login (attempt 1/3) - if successful, return peer config 4. Wait 40 seconds, attempt login (attempt 2/3) 5. Wait 40 seconds, attempt login (attempt 3/3) 6. If all attempts fail, return user-friendly error message Total timeline: ~200 seconds (10s timeout + 60s + 10s + 40s + 10s + 40s + 10s) Key behaviors: - Only retries on PermissionDenied/NotFound (peer not ready yet) - Returns immediately on unexpected errors (network, auth issues) - Respects context cancellation during wait periods - Provides clear user messaging throughout the process Success message: "registration completed successfully on server (recovered from timeout)" Failure message: "peer registration is taking longer than expected, please try 'netbird up' again in a few minutes" This completes the fix for duplicate device registration issues by providing a better UX while maintaining the protections from previous commits.
|







The previous implementation used a single 10 second backoff for both registration and login operations, default GRPC timeout. However, these are fundamentally different operations with different timeout requirements:
Registration (new peer creation) legitimately needs longer timeouts because it involves database writes, external IdP validation, and group membership calculations. Login (existing peer authentication) is faster and should fail quickly to provide actionable feedback to users.
Describe your changes
This change:
The auto-detection approach eliminates the need for boolean parameters and keeps the login() function simple while providing appropriate timeouts for each scenario.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.