Skip to content

[client] Refactor WG endpoint setup with role-based proxy activation#5277

Open
pappz wants to merge 5 commits intomainfrom
refactor/role-based-peer-setup
Open

[client] Refactor WG endpoint setup with role-based proxy activation#5277
pappz wants to merge 5 commits intomainfrom
refactor/role-based-peer-setup

Conversation

@pappz
Copy link
Contributor

@pappz pappz commented Feb 9, 2026

Update websocket to v1.8.14

Move wgProxy.Work() and wgConfigWorkaround() into EndpointUpdater to centralize the timing of proxy activation relative to WireGuard peer updates. The initiator calls Work() before updating the peer (so the proxy is ready for the immediate handshake), while the responder calls Work() after (to avoid premature activation).

Add SwitchWGEndpoint for relay fallback on ICE disconnect, which skips the initiator/responder distinction since WireGuard roaming wil overwrite the endpoint regardless.

Describe your changes

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Direct endpoint switching and explicit endpoint removal for faster manual control.
    • Option to apply endpoint updates immediately or defer them for smoother transitions.
  • Bug Fixes

    • Improved relay vs direct upgrade handling with better sequencing and conditional proxy activation.
    • Removed timing workaround in favor of controller-aware update flow.
  • Chores

    • Updated websocket dependency to latest patch release.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Endpoint configuration API and control flow were changed to support relay-backup awareness and conditional WireGuard proxy activation; workaround and several direct proxy starts were removed or deferred; new endpoint control methods (SwitchWGEndpoint, RemoveEndpointAddress) were added; websocket dependency bumped.

Changes

Cohort / File(s) Summary
Connection & Relay Flow
client/internal/peer/conn.go
Passes relay-backup info to endpoint configuration; removed wgConfigWorkaround usages; made wgProxy.Work() conditional based on setEndpointNow/controller status; adjusted ICE↔relay upgrade sequencing and removed direct relay Work starts.
Endpoint Updater API & Logic
client/internal/peer/endpoint.go
ConfigureWGEndpoint signature changed to include setEndpointNow (controls immediate vs deferred update); added SwitchWGEndpoint and RemoveEndpointAddress; split initiator/responder flows into configureAsInitiator/configureAsResponder; reworked delayed-update handling and includes an internal wgConfigWorkaround helper.
Dependency Update
go.mod
Bumped github.com/coder/websocket from v1.8.13 to v1.8.14.

Sequence Diagram(s)

sequenceDiagram
    participant ICE as ICE
    participant Conn as Conn (conn.go)
    participant Endpoint as EndpointUpdater (endpoint.go)
    participant WGProxy as wgProxy
    participant WG as WireGuard

    ICE->>Conn: onICEConnectionIsReady()
    Conn->>Endpoint: ConfigureWGEndpoint(addr, presharedKey, setEndpointNow)
    alt setEndpointNow == true
        Endpoint->>WG: update endpoint immediately
        Conn->>WGProxy: allow/start Work()
        WGProxy->>WG: proxy/redirect to new endpoint
    else setEndpointNow == false
        Endpoint->>WG: schedule delayed update (nil/placeholder)
        Conn->>WGProxy: defer Work() until endpoint applied
        Endpoint-->>Conn: later apply real endpoint update
        Conn->>WGProxy: start Work() after endpoint applied
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • lixmal

Poem

🐰
I hopped through endpoints with a key so bright,
Paused my proxy steps to get the timing right,
Initiator sprang, responder kept the pace,
Relay watched patiently, then joined the race,
WireGuard hummed — the mesh found its light.

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ⚠️ Unable to check for merge conflicts: Failed to fetch base branch: From https://github.com/netbirdio/netbird
! [rejected] main -> main (non-fast-forward)
+ 1024d45...0d1ffba main -> origin/main (forced update)
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main refactoring effort: centralizing WireGuard proxy activation logic with role-based ordering in the endpoint setup process.
Description check ✅ Passed The description explains the core changes and rationale, though it omits the required 'Describe your changes' section content and leaves the Issue ticket field blank.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@pappz pappz force-pushed the refactor/role-based-peer-setup branch from d83d029 to a406171 Compare February 9, 2026 21:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
client/internal/peer/conn.go (1)

436-445: ⚠️ Potential issue | 🟠 Major

Direct WgInterface.RemoveEndpointAddress call bypasses EndpointUpdater — pending delayed update may race.

Line 443 calls RemoveEndpointAddress directly on WgInterface, bypassing the EndpointUpdater. If a responder delayed update was previously scheduled (from the relay path), it won't be cancelled here and could fire after this removal, re-establishing a stale endpoint.

The if branch (line 436) correctly goes through endpointUpdater.ConfigureWGEndpoint which cancels delayed updates, but the else branch doesn't. Consider using endpointUpdater.RemoveEndpointAddress() here instead (after fixing it to cancel delayed updates, as noted in the endpoint.go review).

The same concern applies to handleRelayDisconnectedLocked at line 542.

🤖 Fix all issues with AI agents
In `@client/internal/peer/endpoint.go`:
- Around line 68-70: RemoveEndpointAddress currently calls
e.wgConfig.WgInterface.RemoveEndpointAddress without taking e.mu or cancelling
any pending delayed updates, which allows a concurrent scheduleDelayedUpdate
goroutine to call updateWireGuardPeer and re-add the endpoint; fix by acquiring
e.mu at the start (same lock discipline as RemoveWgPeer/SwitchWGEndpoint), call
e.waitForCloseTheDelayedUpdate() before invoking
e.wgConfig.WgInterface.RemoveEndpointAddress(e.wgConfig.RemoteKey), then perform
the removal and release the lock so any pending delayed update is
cancelled/serialized correctly.
🧹 Nitpick comments (4)
client/internal/peer/endpoint.go (3)

72-77: configureAsInitiator is a trivial pass-through.

This method adds no logic beyond what updateWireGuardPeer already does. Consider inlining it into the ConfigureWGEndpoint caller to reduce indirection, or at least documenting why the separate method exists (e.g., symmetry with configureAsResponder for future extension).


138-142: wgConfigWorkaround is defined here but only called from conn.go — consider co-locating or documenting the coupling.

This package-level function with a bare time.Sleep has no documentation of who should call it or when. The AI summary indicates the old call sites in conn.go were removed, yet conn.go line 402 still calls it. If this is intended to be the single canonical location for the workaround, a brief doc comment explaining the expected caller would help.


37-48: Naming: setEndpointNow is slightly misleading for the initiator path.

When e.initiator is true, setEndpointNow is ignored entirely (the initiator always updates immediately). This could confuse callers into thinking the flag controls behavior in all cases. A brief doc comment on ConfigureWGEndpoint clarifying that setEndpointNow only affects the responder path would help.

client/internal/peer/conn.go (1)

395-402: Semantics of hasRelayBackup as setEndpointNow — works correctly but is non-obvious.

hasRelayBackup (whether a relay proxy exists) is passed as setEndpointNow to ConfigureWGEndpoint. This means: if the responder already has a relay backup, skip the delayed-update dance and set the ICE endpoint immediately. The logic is sound (the fallback is already in place), but the semantic mapping is implicit. A one-line comment here would help future readers understand why "has relay backup" ⟹ "set endpoint now".

For relay connections, the controller (initiator) now activates the
wgProxy before configuring the WG endpoint, while the non-controller
(responder) configures the endpoint first with a delayed update, then
activates the proxy after. This prevents the responder from sending
traffic through the proxy before WireGuard is ready to receive it,
avoiding handshake congestion when both sides try to initiate
simultaneously.

For ICE connections, pass hasRelayBackup as the setEndpointNow flag
so the responder sets the endpoint immediately when a relay fallback
exists (avoiding the delayed update path since relay is already
available as backup).

On ICE disconnect with relay fallback, remove the duplicate
wgProxyRelay.Work() calls — the relay proxy is already active from
initial setup, so re-activating it is unnecessary.

In EndpointUpdater, split ConfigureWGEndpoint into explicit
configureAsInitiator and configureAsResponder paths, and add the
setEndpointNow parameter to let the caller control whether the
responder applies the endpoint immediately or defers it. Add unused
SwitchWGEndpoint and RemoveEndpointAddress methods. Remove the
wgConfigWorkaround sleep from the relay setup path.
@pappz pappz force-pushed the refactor/role-based-peer-setup branch from a406171 to 88086fd Compare February 9, 2026 21:55
@pappz pappz requested a review from lixmal February 13, 2026 16:15
lixmal
lixmal previously approved these changes Feb 13, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@client/internal/peer/conn.go`:
- Around line 395-401: The code is passing hasRelayBackup as the setEndpointNow
flag to conn.endpointUpdater.ConfigureWGEndpoint which is backwards; when a
relay backup exists we should NOT apply the ICE endpoint immediately. Change the
third argument to the inverse of hasRelayBackup (e.g., setEndpointNow :=
!hasRelayBackup and pass that) when calling
conn.endpointUpdater.ConfigureWGEndpoint in the ICE path (the block using
conn.presharedKey(iceConnInfo.RosenpassPubKey) and
conn.handleConfigurationFailure), so the presence of wgProxyRelay prevents
immediate endpoint application.

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants