Skip to content

feat(l1): dual-stack IPv6 UDP discovery sockets#6377

Draft
azteca1998 wants to merge 17 commits intofeat/p2p-dual-stack-ipv6from
feat/p2p-dual-stack-discovery
Draft

feat(l1): dual-stack IPv6 UDP discovery sockets#6377
azteca1998 wants to merge 17 commits intofeat/p2p-dual-stack-ipv6from
feat/p2p-dual-stack-discovery

Conversation

@azteca1998
Copy link
Copy Markdown
Contributor

Summary

  • One UDP socket per address family: start_network() now iterates over NetworkConfig.discovery_bind_addrs and binds a dedicated UDP socket per entry. A dual-stack node (e.g. --discovery.addr 0.0.0.0,::) gets two independent sockets, one for IPv4 and one for IPv6.
  • Independent discv4/discv5 per socket: each socket spawns its own Discv4Server and Discv5Server instances (backed by the same shared PeerTable) and its own DiscoveryMultiplexer. Responses always go back on the same-family socket they arrived on.
  • Correct Ping from endpoint: each server's local_node carries the external address for that socket's family, so discv4 Ping/Pong packets advertise the right address to remote peers.
  • --discovery.addr becomes a Vec: accepts comma-separated addresses (e.g. --discovery.addr 0.0.0.0,::) mirroring --p2p.addr. When omitted, defaults to the same set as --p2p.addr, so dual-stack RLPx automatically implies dual-stack discovery with no extra flags.
  • NetworkConfig refactor: discovery_bind_addr/discovery_external_addr (scalar) replaced by discovery_bind_addrs/discovery_external_addrs (Vec<IpAddr>); bind_udp_addr() replaced by bind_udp_addrs().

Depends on

Usage

Single-stack IPv4 (unchanged default):

ethrex  # discovery.addr defaults to p2p.addr, which auto-detects local IPv4

Dual-stack:

ethrex --p2p.addr 0.0.0.0,:: --nat.extip <ipv4> 
# discovery.addr defaults to [0.0.0.0, ::], two UDP sockets are bound

IPv6-only:

ethrex --p2p.addr :: --nat.extip <ipv6>

Override discovery to IPv4-only while RLPx is dual-stack:

ethrex --p2p.addr 0.0.0.0,:: --discovery.addr 0.0.0.0

Test plan

  • cargo check passes (verified locally)
  • Default single-stack: one UDP socket bound, behaviour identical to before
  • --discovery.addr 0.0.0.0,::: two UDP sockets visible via lsof -iUDP:30303
  • IPv6 discv4 ping received on IPv6 socket and responded on the same socket
  • Discovered IPv6 peers land in the shared peer table

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Review Summary

This PR adds dual-stack support for discovery by allowing multiple UDP bind addresses. The changes are well-structured and follow Rust best practices. However, there are a few issues to address:

Critical Issues

  1. Port exhaustion risk (crates/networking/p2p/network.rs:122-191): The loop creates separate UDP sockets for each bind address but uses the same port number for all. This will fail on most systems when binding to both IPv4 and IPv6 on the same port (e.g., 0.0.0.0:30303 and [::]:30303). The OS typically reserves the port for the first socket, causing the second bind to fail with AddrInUse.

  2. Missing validation for discovery address count (cmd/ethrex/initializers.rs:415-425): There's no validation that the number of discovery addresses doesn't exceed 2 (IPv4 + IPv6). Users could supply 3+ addresses leading to undefined behavior.

Security Concerns

  1. Unvalidated IP parsing (cmd/ethrex/initializers.rs:421-424): Using .expect() on user input parsing can panic. Should use proper error handling and provide user-friendly error messages.

Performance & Correctness

  1. Inefficient bootnode cloning (crates/networking/p2p/network.rs:141, 161): The bootnodes.clone() inside the loop is inefficient when there are multiple discovery addresses. Clone once outside the loop.

  2. Missing length validation for address vectors (crates/networking/p2p/types.rs): The discovery_bind_addrs and discovery_external_addrs vectors should always have the same length, but this invariant isn't enforced or documented.

Minor Issues

  1. Incomplete help text (cmd/ethrex/cli.rs:291-297): The help mentions "one or two comma-separated addresses" but doesn't specify the format (IPv4/IPv6) or validation rules.

  2. Default value inconsistency (cmd/ethrex/cli.rs:286): The discovery_addr field uses Vec<String> but the help text implies optional usage. Consider using Option<Vec<String>> or clarifying the empty vec behavior.

Suggested Fixes

  1. For the port exhaustion issue, either:

    • Use different ports for IPv4/IPv6 discovery (e.g., 30303 and 30304)
    • Implement SO_REUSEPORT where supported
    • Document this limitation clearly
  2. Add validation in initializers.rs:

if discovery_bind_addrs.len() > 2 {
    panic!("Maximum 2 discovery addresses supported (IPv4 + IPv6)");
}
  1. Replace .expect() with proper error handling:
.map(|a| a.parse()
    .map_err(|_| format!("Invalid discovery address: {}", a)))?
  1. Move bootnodes.clone() outside the loop.

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

PR #6377feat(p2p): dual-stack IPv6 UDP discovery sockets Review

Overall: The design is sound — one UDP socket per address family is the correct approach for dual-stack. The code is well-commented and follows existing patterns. However there are a few correctness and robustness issues worth addressing before merge.


Bugs / Correctness

1. Silent zip truncation in network.rs (~line 133)

let udp_pairs: Vec<(SocketAddr, IpAddr)> = context
    .network_config
    .bind_udp_addrs()
    .into_iter()
    .zip(
        context
            .network_config
            .discovery_external_addrs
            .iter()
            .copied(),
    )
    .collect();

Iterator::zip silently truncates to the shorter iterator. If discovery_bind_addrs and discovery_external_addrs diverge in length for any reason (bug in initializer, future NetworkConfig construction path), some sockets are silently dropped with no error or warning. Since both vecs are constructed in lock-step in the initializer, this is unlikely in practice, but the invariant is not enforced at the type level.

Suggestion: add a debug_assert_eq! (or a runtime check with an early return/error) before the zip:

debug_assert_eq!(
    context.network_config.discovery_bind_addrs.len(),
    context.network_config.discovery_external_addrs.len(),
    "discovery_bind_addrs and discovery_external_addrs must have equal length"
);

2. Panic on user input in initializers.rs

opts.discovery_addr
    .iter()
    .map(|a| a.parse().expect("Failed to parse --discovery.addr address"))
    .collect()

.expect() on user-supplied input causes a panic rather than a clean error message. The previous code had the same problem, but now that discovery_addr is a Vec it's worth fixing. Should propagate an error (anyhow::bail! / ?) consistent with the rest of the initializer function.


3. Half-started state on bind error in network.rs

The loop spawns discv4/discv5 servers and starts the multiplexer for each socket in sequence:

for (bind_addr, external_addr) in udp_pairs {
    let udp_socket = Arc::new(
        UdpSocket::bind(bind_addr)
            .await
            .map_err(NetworkError::UdpSocketError)?,  // early return here
    );
    // ... spawns servers ...
    DiscoveryMultiplexer::new(...).start();
}

If the first socket binds and its servers are spawned successfully, but the second socket's bind() fails (e.g. EADDRINUSE when binding [::]:30303 on Linux where IPV6_V6ONLY is off by default), the function returns an error but the first socket's tasks are already running. This leaves the node in a half-started state.

The fix depends on the broader task lifecycle — at minimum, a comment documenting this behaviour and the OS-level caveat about IPV6_V6ONLY would help operators diagnose a port-conflict failure.


Minor Issues

4. num_args = 0.. doesn't enforce the "one or two" documented limit (cli.rs)

The long_help says "One or two comma-separated addresses" but num_args = 0.. accepts any count. Either tighten to num_args = 0..=2 or remove the "one or two" wording and let users supply however many addresses they want.

5. bootnodes.clone() inside the loop (network.rs)

bootnodes is cloned once per socket (for Discv4Server::spawn) and potentially again for Discv5Server::spawn. For a dual-stack node this means 4 clones instead of 2. Since bootnode lists are small this is not a performance concern, but it's worth noting for awareness. No action required.

6. context.local_node.node_id() is shared across multiplexers

Both DiscoveryMultiplexer instances use the same node_id. This is correct — the node ID is derived from the private key and is family-agnostic — but a brief comment noting this intention would aid reviewers unfamiliar with the design.


What looks good

  • The per-family external address derivation logic in initializers.rs correctly handles the NAT + wildcard combinations for both IPv4 and IPv6.
  • Keeping a single shared PeerTable across all sockets avoids duplicating peer state.
  • from_node() in types.rs is correctly updated to the new Vec API.
  • The comment block above start_network clearly explains the purpose of udp_pairs.

Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings:

  1. High: partial startup leaves background discovery tasks running on error
    In network.rs, each UDP family is started immediately inside the loop, and DiscoveryMultiplexer::start() is fire-and-forget.
    If a later bind/spawn fails (e.g., second family), start_network returns Err, but earlier discovery tasks stay alive. That creates inconsistent node state (startup reported failed, yet discovery still running and ports occupied).
    Suggested fix: stage startup in two phases:

    • phase 1: validate config and bind/spawn all per-family components, collecting handles;
    • phase 2: start multiplexers only after all families succeeded, or explicitly abort already-started tasks on failure.
  2. Medium: silent truncation when pairing bind/external discovery addresses
    zip() pairing silently drops extra entries if discovery_bind_addrs.len() != discovery_external_addrs.len().
    That can make configured sockets or announced addresses disappear without error, which is risky for discovery correctness.
    Suggested fix: explicitly check equal lengths and return a config error if mismatched.

  3. Low/Medium: CLI contract says “one or two”, but parser accepts zero or many
    --discovery.addr uses num_args = 0.., and initializer accepts all parsed entries without cardinality/family validation (initializers.rs).
    This allows unexpected configurations (empty via bare flag, >2 addresses, duplicate family), which can produce ambiguous behavior and extra discovery workers.
    Suggested fix: enforce 1..=2 for explicit flag usage and validate “max one IPv4 + max one IPv6”.

No EVM opcode/gas/consensus/state-transition logic is touched in this diff, so no direct execution-layer correctness concerns from these changes.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 17, 2026

Lines of code report

Total lines added: 130
Total lines removed: 0
Total lines changed: 130

Detailed view
+-----------------------------------------------------------+-------+------+
| File                                                      | Lines | Diff |
+-----------------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/cli.rs                                  | 992   | +5   |
+-----------------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs                         | 676   | +11  |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/discv4/server.rs             | 675   | +8   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/discv5/server.rs             | 852   | +3   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/network.rs                   | 700   | +36  |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_table.rs                | 1379  | +53  |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/connection/handshake.rs | 493   | +3   |
+-----------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/types.rs                     | 580   | +11  |
+-----------------------------------------------------------+-------+------+

@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-discovery branch 2 times, most recently from fcd9b49 to 726af77 Compare March 19, 2026 13:30
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-ipv6 branch from 430df2e to 858f935 Compare March 19, 2026 14:14
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-discovery branch from 726af77 to 57b6cb3 Compare March 19, 2026 14:18
@azteca1998 azteca1998 changed the title feat(p2p): dual-stack IPv6 UDP discovery sockets feat(l1): dual-stack IPv6 UDP discovery sockets Mar 19, 2026
@github-actions github-actions bot added the L1 Ethereum client label Mar 19, 2026
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-ipv6 branch from 858f935 to 7d3ad8b Compare March 19, 2026 14:30
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-discovery branch 3 times, most recently from 2f01fa6 to ba70b26 Compare March 23, 2026 13:09
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-ipv6 branch from 6c0615f to aade806 Compare March 23, 2026 13:09
Extends the discovery layer to bind one UDP socket per configured address
family, giving IPv6 peers the same discv4/discv5 reachability as IPv4 peers.

- `NetworkConfig`: `discovery_bind_addr`/`discovery_external_addr` (scalar)
  replaced by `discovery_bind_addrs`/`discovery_external_addrs` (`Vec<IpAddr>`).
  `bind_udp_addr()` replaced by `bind_udp_addrs() -> Vec<SocketAddr>`.

- `start_network()`: iterates over `bind_udp_addrs()` and for each entry
  binds a dedicated UDP socket, spawns independent discv4 and discv5 server
  instances (sharing the same peer table), and starts a `DiscoveryMultiplexer`.
  Each server's `local_node` uses the matching external address so Ping/Pong
  `from` endpoints are always correct for the socket's address family.

- `--discovery.addr` CLI flag changed from `Option<String>` to `Vec<String>`
  with comma delimiter, accepting e.g. `--discovery.addr 0.0.0.0,::`.
  When omitted, defaults to the same address set as `--p2p.addr`, so
  dual-stack RLPx automatically implies dual-stack discovery.

- `get_local_p2p_node()` builds `discovery_bind_addrs` / `discovery_external_addrs`
  applying the same per-address NAT / auto-detect logic already used for RLPx.
discv4 and discv5 servers can discover peers of both address families from
bootnode neighbor responses, then attempt to send UDP packets to those peers
using the current socket — which fails with EAFNOSUPPORT (errno 97) when
the destination is IPv6 but the socket is IPv4-bound (or vice versa).

Guard all outbound UDP send paths in both servers so packets destined for
an address family that doesn't match the socket are silently skipped:

- discv4: `send()`, `send_else_dispose()`, and the FindNode direct send in
  `lookup()` all check `addr.is_ipv6() != self.local_node.ip.is_ipv6()`.
- discv5: `send_packet()` applies the same check.

This fix is needed even in single-stack IPv4 deployments, where IPv6 peers
appear in neighbor lists from bootnodes that serve both families.
…ery sockets

UdpSocket::bind() sets no socket options, causing two problems when binding
both 0.0.0.0:port and [::]:port for dual-stack discovery:

- On Linux systems where net.ipv6.bindv6only=0 (the common default), the
  IPv6 socket claims both the IPv6 and IPv4 wildcard, so the subsequent
  IPv4 bind fails with EADDRINUSE.
- Without SO_REUSEADDR, a fast restart after a crash also fails with
  EADDRINUSE while the OS-level TIME_WAIT drains.

Replace the raw UdpSocket::bind call with a udp_socket() helper (mirroring
the existing listener() helper for TCP) that uses socket2 to set options
before binding: SO_REUSEADDR, SO_REUSEPORT (Unix), and IPV6_V6ONLY=true for
IPv6 sockets so each wildcard address claims only its own address family.
Makes the IPv6 TCP listener explicitly IPv6-only, consistent with the
UDP socket. Without this, on Linux systems where net.ipv6.bindv6only
defaults to 0, ss would show the socket as '*:30303' (dual-stack)
rather than '[::]:30303' (IPv6-only).
Store ip6/tcp6/udp6 from peer ENRs in Contact. The IPv6 discv4/discv5
servers filter get_contact_for_lookup to contacts with an IPv6 ENR
address and receive the contact with node.ip rewritten to that IPv6
address, bootstrapping IPv6 discovery from dual-stack peers found via
IPv4. Adds a debug log when an ENR with an IPv6 address is received.

Also fix IPV6_V6ONLY on the IPv6 TCP listener using socket2 (Tokio's
TcpSocket has no set_only_v6).
connection_addr() now skips 0.0.0.0 (and ::) so a peer that advertises
a broken/NAT IPv4 correctly falls through to their IPv6 address.
record_enr_response_received also ignores unspecified :: when storing
the IPv6 relay address for the IPv6 discovery server.
Most dual-stack peers advertise ip6 but omit tcp6/udp6, implying the
same port serves both families. connection_addr() now uses tcp6_port
falling back to tcp_port when selecting an IPv6 TCP address, allowing
RLPx to reach peers that only set ip6 without tcp6 in their ENR.
Testing confirmed IPv6 attempts reach the peer but get Connection refused
— most mainnet nodes advertise ip6 in their ENR but don't actually listen
on IPv6 TCP. Revert to IPv4-preferred: only rewrite node.ip to IPv6 in
get_contact_to_initiate when the stored IPv4 is 0.0.0.0 (IPv6-only peer).
Adds a debug log when an outbound RLPx connection is attempted over IPv6
so we can confirm the initiator is actually dialing. Bumps TCP connection
errors from debug to warn so they appear without enabling debug logging.
When record_enr_response_received changes a contact's IP (e.g. 0.0.0.0
to an IPv6 address), the node was still in already_tried_peers and the
RLPx initiator would skip it indefinitely. Now we remove it from
already_tried_peers whenever the IP changes so the initiator retries
with the correct address.
enode_url() now wraps IPv6 addresses in brackets (e.g. [::1]:30303)
so the URL is valid for both display and re-parsing as a bootnode.

from_node() was adding ENR key-value pairs in non-alphabetical order
(secp256k1 before ip), violating the ENR spec and producing a
different signature than expected. Add sort_by to match from_network_config.

Also add tcp6_port/udp6_port to the NodeRecordPairs literal in the
discv5 messages test to fix the compile error from adding those fields.
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-discovery branch from ba70b26 to d6be22a Compare March 30, 2026 22:09
@azteca1998 azteca1998 force-pushed the feat/p2p-dual-stack-ipv6 branch from aade806 to 0b2a359 Compare March 30, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant