Skip to content

Performance improvements: tunes sysctl configuration and adds many tunnels per interface#19

Open
luiscape wants to merge 15 commits intomainfrom
luis/apply-sysctl-updates
Open

Performance improvements: tunes sysctl configuration and adds many tunnels per interface#19
luiscape wants to merge 15 commits intomainfrom
luis/apply-sysctl-updates

Conversation

@luiscape
Copy link
Member

@luiscape luiscape commented Mar 13, 2026

vprox tunnels are notably slow. In an experiment using iperf3, they produced the following results:

32 streams

===================================================================================================================
  Tunnel Throughput (vprox - before changes)
===================================================================================================================
-------------------------------------------------------------------------------------------------------------------
  [tunnel] TCP upload x32                         921.1 Mbps      1098.4 MB  403968 retrans  cpu tx/rx 2%/35%
  [tunnel] TCP download x32                        1.70 Gbps      2023.0 MB       0 retrans  cpu tx/rx 20%/15%
-------------------------------------------------------------------------------------------------------------------

=========================================================================================================
  Baseline Throughput (direct)
=========================================================================================================
---------------------------------------------------------------------------------------------------------
  [baseline] TCP upload x32                4.91 Gbps      5849.0 MB     204 retrans  cpu tx/rx 8%/98%
  [baseline] TCP download x32             11.95 Gbps     14244.9 MB       0 retrans  cpu tx/rx 111%/54%
---------------------------------------------------------------------------------------------------------

vprox is ~4x slower for uploads and ~7x slower for downloads.

Some overhead from encryption is expected when using WireGuard, particularly if the CPU is saturated with encryption or decryption operations. However, a difference this large is unacceptable.

With the changes in this PR, a vprox interface achieves the following throughput in the same experiment:

=========================================================================================================
  Tunnel Throughput (vprox - after changes)
=========================================================================================================
---------------------------------------------------------------------------------------------------------
  [tunnel] TCP upload x32                  3.94 Gbps      4694.6 MB  138560 retrans  cpu tx/rx 7%/92%
  [tunnel] TCP download x32                4.30 Gbps      5130.2 MB       0 retrans  cpu tx/rx 103%/25%
---------------------------------------------------------------------------------------------------------

Uploads reach 3.94 Gbps (80% of baseline) and downloads reach 4.30 Gbps (36% of baseline). This represents a 4x improvement in upload speed and a 2.5x improvement in download speed over the previous version. Both are currently bottlenecked by server CPU utilization — adding more resources will increase throughput further.

This is achieved through two improvements:

  • sysctl performance tuning that configures the Linux kernel to use RX/TX queues
  • the use of multiple WireGuard tunnels nested under a dummy interface, which increases throughput linearly up to the limits of the NIC and CPU on both the server and client

The client now accepts a --tunnels N parameter, where N is the number of tunnels to start. The client creates a dummy interface that routes traffic across all tunnels in round-robin fashion.

Client                                              Server
┌──────────┐  policy routing    
│  vprox0  │  (dummy, user-     
│          │   facing)          
│          │                    
│          │  ip rule: to wg    
│          │  subnet → table    
│          │  with multipath    ┌───────────┐
│ vprox0t0 │◄──────────────────►│  vprox0   │◄─── UDP :50227 ───► vprox0t0 (wg)
│ vprox0t1 │◄──────────────────►│  vprox0t1 │◄─── UDP :50228 ───► vprox0t1 (wg)
│ vprox0t2 │◄──────────────────►│  vprox0t2 │◄─── UDP :50229 ───► vprox0t2 (wg) ──► Internet
│ vprox0t3 │◄──────────────────►│  vprox0t3 │◄─── UDP :50230 ───► vprox0t3 (wg)     (SNAT)
└──────────┘                    └───────────┘

Backwards compatibility is preserved when the parameter is omitted or when passing --tunnels 1.

@luiscape luiscape self-assigned this Mar 13, 2026
@luiscape luiscape requested a review from abhagwat March 13, 2026 03:28
@luiscape luiscape changed the title Performance improvements: tunes sysctl configuration and add many tunnels per interface Performance improvements: tunes sysctl configuration and adds many tunnels per interface Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant