Skip to content

XDP sockets fallback #5995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions book/api/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ following stages to each configure command:
device.
- `ethtool-gro` Disables generic receive offload (GRO) on the network
device.
- `ethtool-loopback` Disables UDP segmentation on the loopback device.

| Arguments | Description |
|-------------------|-------------|
Expand Down Expand Up @@ -102,7 +101,6 @@ and configure the number of combined channels on the network device.
| `root` | increase `/proc/sys/vm/nr_hugepages` and mount hugetlbfs filesystems. Only applies for the `hugetlbfs` stage |
| `root` | increase network device channels with `ethtool --set-channels`. Only applies for the `ethtool-channels` stage |
| `root` | disable network device generic-receive-offload (gro) with `ethtool --offload IFACE generic-receive-offload off`. Only applies for the `ethtool-gro` stage |
| `root` | disable network device tx-udp-segmentation with `ethtool --offload lo tx-udp-segmentation off`. Only applies for the `ethtool-loopback` stage |
| `CAP_SYS_ADMIN` | set kernel parameters in `/proc/sys`. Only applies for the `sysctl` stage |

:::
Expand Down
12 changes: 4 additions & 8 deletions book/api/metrics-generated.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@

| Metric | Type | Description |
|--------|------|-------------|
| <span class="metrics-name">net_&#8203;rx_&#8203;pkt_&#8203;cnt</span> | counter | Packet receive count. |
| <span class="metrics-name">net_&#8203;rx_&#8203;pkt_&#8203;cnt</span><br/>{pkt_&#8203;kind="<span class="metrics-enum">ip4_&#8203;udp</span>"} | counter | Packet receive count (ignoring tunnels) (IPv4 UDP packet (no options)) |
| <span class="metrics-name">net_&#8203;rx_&#8203;pkt_&#8203;cnt</span><br/>{pkt_&#8203;kind="<span class="metrics-enum">ip4_&#8203;opt_&#8203;udp</span>"} | counter | Packet receive count (ignoring tunnels) (IPv4 UDP packet (with options)) |
| <span class="metrics-name">net_&#8203;rx_&#8203;bytes_&#8203;total</span> | counter | Total number of bytes received (including Ethernet header). |
| <span class="metrics-name">net_&#8203;rx_&#8203;undersz_&#8203;cnt</span> | counter | Number of incoming packets dropped due to being too small. |
| <span class="metrics-name">net_&#8203;rx_&#8203;fill_&#8203;blocked_&#8203;cnt</span> | counter | Number of incoming packets dropped due to fill ring being full. |
Expand All @@ -59,8 +60,8 @@
| <span class="metrics-name">net_&#8203;tx_&#8203;submit_&#8203;cnt</span> | counter | Number of packet transmit jobs submitted. |
| <span class="metrics-name">net_&#8203;tx_&#8203;complete_&#8203;cnt</span> | counter | Number of packet transmit jobs marked as completed by the kernel. |
| <span class="metrics-name">net_&#8203;tx_&#8203;bytes_&#8203;total</span> | counter | Total number of bytes transmitted (including Ethernet header). |
| <span class="metrics-name">net_&#8203;tx_&#8203;route_&#8203;fail_&#8203;cnt</span> | counter | Number of packet transmit jobs dropped due to route failure. |
| <span class="metrics-name">net_&#8203;tx_&#8203;neighbor_&#8203;fail_&#8203;cnt</span> | counter | Number of packet transmit jobs dropped due to unresolved neighbor. |
| <span class="metrics-name">net_&#8203;tx_&#8203;corrupt_&#8203;cnt</span> | counter | Number of packet transmit jobs dropped due to malformed content. |
| <span class="metrics-name">net_&#8203;tx_&#8203;fallback_&#8203;cnt</span> | counter | Number of packet transmit jobs handled via sockets fallback instead of XDP. |
| <span class="metrics-name">net_&#8203;tx_&#8203;full_&#8203;fail_&#8203;cnt</span> | counter | Number of packet transmit jobs dropped due to XDP TX ring full or missing completions. |
| <span class="metrics-name">net_&#8203;tx_&#8203;busy_&#8203;cnt</span> | gauge | Number of transmit buffers currently busy. |
| <span class="metrics-name">net_&#8203;tx_&#8203;idle_&#8203;cnt</span> | gauge | Number of transmit buffers currently idle. |
Expand All @@ -76,7 +77,6 @@
| <span class="metrics-name">net_&#8203;rx_&#8203;gre_&#8203;invalid_&#8203;cnt</span> | counter | Number of invalid GRE packets received |
| <span class="metrics-name">net_&#8203;rx_&#8203;gre_&#8203;ignored_&#8203;cnt</span> | counter | Number of received but ignored GRE packets |
| <span class="metrics-name">net_&#8203;tx_&#8203;gre_&#8203;cnt</span> | counter | Number of GRE packet transmit jobs submitted |
| <span class="metrics-name">net_&#8203;tx_&#8203;gre_&#8203;route_&#8203;fail_&#8203;cnt</span> | counter | Number of GRE packets transmit jobs dropped due to route failure |

</div>

Expand Down Expand Up @@ -699,10 +699,6 @@
| <span class="metrics-name">netlnk_&#8203;interface_&#8203;count</span> | gauge | Number of network interfaces |
| <span class="metrics-name">netlnk_&#8203;route_&#8203;count</span><br/>{route_&#8203;table="<span class="metrics-enum">local</span>"} | gauge | Number of IPv4 routes (Local) |
| <span class="metrics-name">netlnk_&#8203;route_&#8203;count</span><br/>{route_&#8203;table="<span class="metrics-enum">main</span>"} | gauge | Number of IPv4 routes (Main) |
| <span class="metrics-name">netlnk_&#8203;neigh_&#8203;probe_&#8203;sent</span> | counter | Number of neighbor solicit requests sent to kernel |
| <span class="metrics-name">netlnk_&#8203;neigh_&#8203;probe_&#8203;fails</span> | counter | Number of neighbor solicit requests that failed to send (kernel too slow) |
| <span class="metrics-name">netlnk_&#8203;neigh_&#8203;probe_&#8203;rate_&#8203;limit_&#8203;host</span> | counter | Number of neighbor solicit that exceeded the per-host rate limit |
| <span class="metrics-name">netlnk_&#8203;neigh_&#8203;probe_&#8203;rate_&#8203;limit_&#8203;global</span> | counter | Number of neighbor solicit that exceeded the global rate limit |

</div>

Expand Down
20 changes: 1 addition & 19 deletions book/guide/initializing.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ so Firedancer can run correctly. It does the following:
device.
* **ethtool-gro** Disable generic-receive-offload (GRO) on the network
device.
* **ethtool-loopback** Disable tx-udp-segmentation on the loopback
device.

The `hugetlbfs` configuration must be performed every time the system
is rebooted, to remount the `hugetlbfs` filesystems, as do `sysctl`,
Expand All @@ -30,7 +28,7 @@ where `mode` is one of:
- `fini` Unconfigure (reverse) the stage if it is reversible.

`stage` can be one or more of `hugetlbfs`, `sysctl`, `hyperthreads`,
`ethtool-channels`, `ethtool-gro`, `ethtool-loopback`, and `snapshots`
`ethtool-channels`, `ethtool-gro`, and `snapshots`
and these stages are described below. You can also use the stage `all`
which will configure everything.

Expand Down Expand Up @@ -193,22 +191,6 @@ Firedancer. It has no dependencies on any other stage.
Changing device settings with `ethtool-gro` requires root privileges, and
cannot be performed with capabilities.

## ethtool-loopback
XDP is incompatible with localhost UDP traffic using a feature called
`tx-udp-segmentation`. This feature must be disabled when connecting Agave
clients to Firedancer over loopback, or when using Frankendancer.

The command run by the stage is `ethtool --offload lo tx-udp-segmentation
off`. We can check that it worked:

<<< @/snippets/ethtool-loopback.ansi

The stage only needs to be run once after boot but before running
Firedancer. It has no dependencies on any other stage.

Changing device settings with `ethtool-loopback` requires root privileges,
and cannot be performed with capabilities.

## snapshots
When starting up, validators must load a snapshot to catch up to the
current state of the blockchain. Snapshots are downloaded from other
Expand Down
22 changes: 0 additions & 22 deletions book/guide/internals/net_tile.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,28 +426,6 @@ completion ring.

The net tile moves completed frames back to the free ring.

## Loopback

The first net tile (`net:0`) sets up XDP on the loopback device, for
two main reasons:

* For testing and development.
* The Agave code sends local traffic to itself to as part of routine
operation (e.g., when it's the leader it sends votes to its own TPU
socket).

The Linux kernel routes outgoing packets addressed to IP addresses
owned by the system via loopback. (See `ip route show table local`)
The net tile partially matches this behavior. For better performance
and simplicity, a second XDP socket is used.

Alternatively, the net tile could have sent such traffic out to the
public gateway, in hopes that the traffic gets mirrored back.

But for now, Firedancer also binds XDP to loopback. This is a small performance hit for other traffic, but otherwise won't interfere.

The loopback device only supports XDP in SKB mode.

## Development

### Network Namespace
Expand Down
21 changes: 0 additions & 21 deletions book/guide/internals/netlink.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,33 +71,12 @@ inputs.
Neighbor table updates are forwarded ot the netlink tile. This path
has limited throughput (few ~100K updates per second).

- `[untrusted traffic] --> [net tile] --> [app tile]` <br/>
`--> [net tile] --> [netlink tile] --> [neighbor discovery]` <br/>
App tiles will blindly respond to the source IP found in untrusted
packets. This source IP can be spoofed. Neighbor solicitation might
be required in order to find out the MAC address of that IP. On IPv4,
these are ARP requests broadcasted to the local network.

Net tiles cannot solicit neighbors directly, so they notify the
netlink tile that neighbor solicitation is needed. (Potentially at
line rate if network configuration is part of a huge subnet)

The netlink tile will deduplicate these requests and forward them to
the kernel.

This path is the only direct 'untrusted traffic' -> 'netlink tile'
data flow, so the internal neighbor solicit message format is kept
as simple as possible for security.

### Neighbor discovery (ARP)

A concurrent open addressed hash table is used to store ARP entries
(henceforth called "neighbor table"). This table attempts to
continuously stay in sync with the kernel.

The netlink tile requests neighbor solicitations via the netlink
equivalent of `ip neigh add dev DEVICE IP use`.

### Routing

The Firedancer network stack supports very simple routing tables as
Expand Down
1 change: 0 additions & 1 deletion book/snippets/commands/configure-check.ansi
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ $ fdctl configure check all
WARNING sysctl ... not configured ... kernel parameter `/proc/sys/vm/max_map_count` is too low (got 65536 but expected at least 1000000)
WARNING ethtool-channels ... not configured ... device `ens3f0` does not have right number of channels (got 1 but expected 2)
WARNING ethtool-gro ... not configured ... device `ens3f0` has generic-receive-offload enabled. Should be disabled
WARNING ethtool-loopback ... not configured ... device `lo` has tx-udp-segmentation enabled. Should be disabled
ERR  failed to configure some stages
3 changes: 0 additions & 3 deletions book/snippets/commands/configure-init.ansi
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,3 @@ $ fdctl configure init all
NOTICE  ethtool-gro ... unconfigured ... device `ens3f0` has generic-receive-offload enabled. Should be disabled
NOTICE  ethtool-gro ... configuring
NOTICE  ethtool-gro ... RUN: `ethtool --offload ens3f0 generic-receive-offload off`
NOTICE  ethtool-loopback ... unconfigured ... device `lo` has tx-udp-segmentation enabled. Should be disabled
NOTICE  ethtool-loopback ... configuring
NOTICE  ethtool-loopback ... RUN: `ethtool --offload lo tx-udp-segmentation off`
1 change: 0 additions & 1 deletion book/snippets/configure.ansi
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,3 @@
NOTICE  sysctl ... already valid
NOTICE  ethtool-channels ... already valid
NOTICE  ethtool-gro ... already valid
NOTICE  ethtool-loopback ... already valid
7 changes: 0 additions & 7 deletions book/snippets/ethtool-loopback.ansi

This file was deleted.

1 change: 0 additions & 1 deletion contrib/test/test_firedancer_leader.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ echo "
sudo $FD_DIR/$OBJDIR/bin/firedancer-dev configure init kill --config $(readlink -f firedancer-dev.toml)
sudo $FD_DIR/$OBJDIR/bin/firedancer-dev configure init hugetlbfs --config $(readlink -f firedancer-dev.toml)
sudo $FD_DIR/$OBJDIR/bin/firedancer-dev configure init ethtool-channels --config $(readlink -f firedancer-dev.toml)
sudo $FD_DIR/$OBJDIR/bin/firedancer-dev configure init ethtool-gro ethtool-loopback --config $(readlink -f firedancer-dev.toml)
sudo $FD_DIR/$OBJDIR/bin/firedancer-dev configure init keys --config $(readlink -f firedancer-dev.toml)

sudo gdb -iex="set debuginfod enabled on" -ex=r --args $FD_DIR/$OBJDIR/bin/firedancer-dev dev --no-configure --log-path $(readlink -f firedancer-dev.log) --config $(readlink -f firedancer-dev.toml)
1 change: 0 additions & 1 deletion src/app/fdctl/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ configure_stage_t * STAGES[] = {
&fd_cfg_stage_hyperthreads,
&fd_cfg_stage_ethtool_channels,
&fd_cfg_stage_ethtool_gro,
&fd_cfg_stage_ethtool_loopback,
NULL,
};

Expand Down
11 changes: 7 additions & 4 deletions src/app/fdctl/topology.c
Original file line number Diff line number Diff line change
Expand Up @@ -378,16 +378,19 @@ fd_topo_initialize( config_t * config ) {
}
FD_TEST( fd_pod_insertf_ulong( topo->props, poh_shred_obj->id, "poh_shred" ) );

FOR(net_tile_cnt) fd_topos_net_tile_finish( topo, i );
fd_topo_net_rx_t rx_rules = {0};
fd_topo_net_rx_rule_push( &rx_rules, DST_PROTO_SHRED, "net_shred", config->tiles.shred.shred_listen_port );
fd_topo_net_rx_rule_push( &rx_rules, DST_PROTO_TPU_QUIC, "net_quic" , config->tiles.quic.quic_transaction_listen_port );
fd_topo_net_rx_rule_push( &rx_rules, DST_PROTO_TPU_UDP, "net_quic" , config->tiles.quic.regular_transaction_listen_port );

fd_topos_net_tile_finish( topo );

for( ulong i=0UL; i<topo->tile_cnt; i++ ) {
fd_topo_tile_t * tile = &topo->tiles[ i ];

if( FD_UNLIKELY( !strcmp( tile->name, "net" ) || !strcmp( tile->name, "sock" ) ) ) {

tile->net.shred_listen_port = config->tiles.shred.shred_listen_port;
tile->net.quic_transaction_listen_port = config->tiles.quic.quic_transaction_listen_port;
tile->net.legacy_transaction_listen_port = config->tiles.quic.regular_transaction_listen_port;
tile->net.rx_rules = rx_rules;

} else if( FD_UNLIKELY( !strcmp( tile->name, "netlnk" ) ) ) {

Expand Down
1 change: 0 additions & 1 deletion src/app/fddev/main.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,6 @@ configure_stage_t * STAGES[] = {
&fd_cfg_stage_hyperthreads,
&fd_cfg_stage_ethtool_channels,
&fd_cfg_stage_ethtool_gro,
&fd_cfg_stage_ethtool_loopback,
&fd_cfg_stage_keys,
&fd_cfg_stage_genesis,
&fd_cfg_stage_blockstore,
Expand Down
2 changes: 1 addition & 1 deletion src/app/firedancer-dev/commands/backtest.c
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ backtest_topo( config_t * config ) {

for( ulong i=0UL; i<topo->tile_cnt; i++ ) {
fd_topo_tile_t * tile = &topo->tiles[ i ];
if( !fd_topo_configure_tile( tile, config ) ) {
if( !fd_topo_configure_tile( tile, config, NULL ) ) {
FD_LOG_ERR(( "unknown tile name %lu `%s`", i, tile->name ));
}

Expand Down
4 changes: 2 additions & 2 deletions src/app/firedancer-dev/commands/gossip.c
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ gossip_topo( config_t * config ) {
if( net_tile_id==ULONG_MAX ) net_tile_id = fd_topo_find_tile( topo, "sock", 0UL );
if( FD_UNLIKELY( net_tile_id==ULONG_MAX ) ) FD_LOG_ERR(( "net tile not found" ));
fd_topo_tile_t * net_tile = &topo->tiles[ net_tile_id ];
net_tile->net.gossip_listen_port = config->gossip.port;
fd_topo_net_rx_rule_push( &net_tile->net.rx_rules, DST_PROTO_GOSSIP, "net_gossip", config->gossip.port );

fd_topob_wksp( topo, "gossip" );
fd_topo_tile_t * gossip_tile = fd_topob_tile( topo, "gossip", "gossip", "metric_in", 0UL, 0, 0 );
Expand Down Expand Up @@ -86,7 +86,7 @@ gossip_topo( config_t * config ) {
FD_TEST( fd_pod_insertf_ulong( topo->props, poh_shred_obj->id, "poh_shred" ) );
fd_topob_tile_uses( topo, gossip_tile, poh_shred_obj, FD_SHMEM_JOIN_MODE_READ_WRITE );

fd_topos_net_tile_finish( topo, 0UL );
fd_topos_net_tile_finish( topo );
fd_topob_auto_layout( topo, 0 );
topo->agave_affinity_cnt = 0;
fd_topob_finish( topo, CALLBACKS );
Expand Down
Loading
Loading