Skip to content

Support multiple UDP source ports (multiport)#768

Open
wadey wants to merge 22 commits intomasterfrom
multiport
Open

Support multiple UDP source ports (multiport)#768
wadey wants to merge 22 commits intomasterfrom
multiport

Conversation

@wadey
Copy link
Member

@wadey wadey commented Oct 17, 2022

The goal of this work is to send packets between two hosts using more than one 5-tuple. When running on networks like AWS where the underlying network driver and overlay fabric makes routing, load balancing, and failover decisions based on the flow hash, this enables more than one flow between pairs of hosts.

Multiport spreads outgoing UDP packets across multiple UDP send ports, which allows nebula to work around any issues on the underlay network. Some example issues this could work around:

  • UDP rate limits on a per flow basis.
  • Partial underlay network failure in which some flows work and some don't

Agreement is done during the handshake to decide if multiport mode will be used for a given tunnel (one side must have tx_enabled set, the other side must have rx_enabled set)

NOTE: you cannot use multiport on a host if you are relying on UDP hole punching to get through a NAT or firewall.

NOTE: Linux only (uses raw sockets to send). Also currently only works with IPv4 underlay network remotes.

This is implemented by opening a raw socket and sending packets with a source port that is based on a hash of the overlay source/destiation port. For ICMP and Nebula metadata packets, we use a random source port.

Example configuration:

multiport:
  # This host support sending via multiple UDP ports.
  tx_enabled: false

  # This host supports receiving packets sent from multiple UDP ports.
  rx_enabled: false

  # How many UDP ports to use when sending. The lowest source port will be
  # listen.port and go up to (but not including) listen.port + tx_ports.
  tx_ports: 100

  # NOTE: All of your hosts must be running a version of Nebula that supports
  # multiport if you want to enable this feature. Older versions of Nebula
  # will be confused by these multiport handshakes.
  #
  # If handshakes are not getting a response, attempt to transmit handshakes
  # using random UDP source ports (to get around partial underlay network
  # failures).
  tx_handshake: false

  # How many unresponded handshakes we should send before we attempt to
  # send multiport handshakes.
  tx_handshake_delay: 2

The goal of this work is to send packets between two hosts using more than one
5-tuple. When running on networks like AWS where the underlying network driver
and overlay fabric makes routing, load balancing, and failover decisions based
on the flow hash, this enables more than one flow between pairs of hosts.

Multiport spreads outgoing UDP packets across multiple UDP send ports,
which allows nebula to work around any issues on the underlay network.
Some example issues this could work around:

- UDP rate limits on a per flow basis.
- Partial underlay network failure in which some flows work and some don't

Agreement is done during the handshake to decide if multiport mode will
be used for a given tunnel (one side must have tx_enabled set, the other
side must have rx_enabled set)

NOTE: you cannot use multiport on a host if you are relying on UDP hole
punching to get through a NAT or firewall.

NOTE: Linux only (uses raw sockets to send). Also currently only works
with IPv4 underlay network remotes.

This is implemented by opening a raw socket and sending packets with
a source port that is based on a hash of the overlay source/destiation
port. For ICMP and Nebula metadata packets, we use a random source port.

Example configuration:

    multiport:
      # This host support sending via multiple UDP ports.
      tx_enabled: false

      # This host supports receiving packets sent from multiple UDP ports.
      rx_enabled: false

      # How many UDP ports to use when sending. The lowest source port will be
      # listen.port and go up to (but not including) listen.port + tx_ports.
      tx_ports: 100

      # NOTE: All of your hosts must be running a version of Nebula that supports
      # multiport if you want to enable this feature. Older versions of Nebula
      # will be confused by these multiport handshakes.
      #
      # If handshakes are not getting a response, attempt to transmit handshakes
      # using random UDP source ports (to get around partial underlay network
      # failures).
      tx_handshake: false

      # How many unresponded handshakes we should send before we attempt to
      # send multiport handshakes.
      tx_handshake_delay: 2
@wadey wadey added this to the v1.7.0 milestone Oct 17, 2022
@wadey wadey mentioned this pull request Oct 17, 2022
@brad-defined
Copy link
Collaborator

Branch now has conflicts, needs updating

@wadey wadey added needs-defined-net-review Review needed from a Defined Networking team member needs-slack-review Needs review from a Slack team member labels Mar 13, 2023
@nbrownus nbrownus modified the milestones: v1.7.0, v1.8.0 Apr 3, 2023
@nbrownus nbrownus modified the milestones: v1.8.0, v1.9.0 Oct 30, 2023
@nbrownus nbrownus modified the milestones: v1.9.0, v1.10.0 Apr 22, 2024
@dioss-Machiel
Copy link
Contributor

I thought of the following use cases where hashing does not seem to be ideal, but I might be missing something and hashing is just better in general?

Suppose I have a HTTP stream from Node 1 to Node 2, it appears it will always get hashed to the same UDP port, so if that UDP flow is rate limited, the HTTP stream will also be limited. I would assume round-robin over the UDP source ports could perform better here?

In another case, when some UDP ports end up being (temporarily) blocked then round-robin seems to also perform better, since retransmissions will likely be sent via a different UDP port (assuming you are running TCP on top)

@rawdigits
Copy link
Collaborator

rawdigits commented Aug 16, 2025

Just wanted to +1 this. I suspected that verizon was throttling individual udp streams and this proved that to be true. I went from 500mbit to 1gbit by using 4x multiport. IMO we should consider mainlining this if it isn't a heavy burden.

edit:
just adding some 24 hour data here to back up the discussion (disregard the weirdness/drops, which were unrelated to the actual change. The steady states before and after are the imporrtant bit.

for context, this is a long running data backup between two synology hosts that are 1000 miles apart, over the public internet.

image

@wadey
Copy link
Member Author

wadey commented Sep 15, 2025

Main issues to consider before merging I think:

  • it breaks down badly if you have a NAT or firewall. This has bitten us a few times at Slack because it’s hard to diagnose (handshakes work but then only a percentage of packets flow afterwards). We could either just warn folks to deal with this, or maybe hole punch with all flows (but then also punchy with all flows?).

  • I think it opens up more chances for race conditions bugs, since there are now multiple routines happening per hostinfo. Not bad just need to track down all the edge cases.

  • Maybe a simpler implementation when the number of flows is small. The raw socket method is overkill unless you are doing hundreds or thousands of flows, Maybe should provide a simpler implementation when there are single digits or tens of flows of just opening sockets for each. The way we hack in the overhead for the raw socket in this PR is the main reason I didn’t want to merge it yet. It’s very gross and I want to find a better way, like an Interface for how out buffers are managed that can deal with the differences between normal and raw sockets easier.

@nbrownus nbrownus modified the milestones: v1.10.0, backlog Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-defined-net-review Review needed from a Defined Networking team member needs-slack-review Needs review from a Slack team member slack

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants