Skip to content

Conversation

sneginha
Copy link

@sneginha sneginha commented Mar 20, 2025

Commits

Cherry-picking 4 patches from upstream:

  1. bonding: add ESP offload features when slaves support
    This patch fixes a perf issue in the bonding driver where ESP segmentation was not being offloaded to the slave devices.

  2. bonding: Correctly support GSO ESP offload
    The fix in patch 1 is incomplete and this change fixes the ESP segmentation issue.

  3. xfrm_output: Force software GSO only in tunnel mode
    Fix in xfrm_output to only trigger software GSO in crypto offload cases for already encapsulated packets in tunnel mode.

  4. xfrm: Support ESN context update to hardware for TX
    Previously xfrm_dev_state_advance_esn() was added for RX only in commit 50bd870, but is needed for TX as well.

Build

/mnt/ciq/sneginha/kernel-src-tree
  CLEAN   arch/x86/crypto
  CLEAN   arch/x86/entry/vdso
  CLEAN   arch/x86/kernel/cpu
  CLEAN   arch/x86/kernel
  CLEAN   arch/x86/purgatory
  CLEAN   arch/x86/realmode/rm
  CLEAN   arch/x86/lib
  CLEAN   certs
  CLEAN   crypto/asymmetric_keys

<SNIP>

[TIMER]{MRPROPER}: 19s
x86_64 architecture detected, copying config
'configs/kernel-5.14.0-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o

<SNIP>

  BTF [M] sound/usb/usx2y/snd-usb-usx2y.ko
  BTF [M] sound/virtio/virtio_snd.ko
  BTF [M] sound/xen/snd_xen_front.ko
  BTF [M] virt/lib/irqbypass.ko
[TIMER]{BUILD}: 2333s
Making Modules
  INSTALL /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  STRIP   /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  SIGN    /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  STRIP   /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  SIGN    /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/arch/x86/crypto/blowfish-x86_64.ko

<SNIP>

  STRIP   /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+/kernel/virt/lib/irqbypass.ko
  DEPMOD  /lib/modules/5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+
[TIMER]{MODULES}: 46s
Making Install
sh ./arch/x86/boot/install.sh \
        6.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-03b404b81bf2+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 25s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-00adedc58a8e+ and Index to 1
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 19s
[TIMER]{BUILD}: 2333s
[TIMER]{MODULES}: 46s
[TIMER]{INSTALL}: 25s
[TIMER]{TOTAL} 2431s
Rebooting in 10 seconds

kABI Check

Checking kABI
kABI check passed

Reboot and Verify

Linux localhost 5.14.0-sneginhal-fips-9-compliant_5.14.0-284.30.1-00adedc58a8e+ #1 SMP PREEMPT_DYNAMIC Thu Mar 20 06:03:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Kernel Self Tests

kselftest_output.txt

@PlaidCat
Copy link
Collaborator

PlaidCat commented Mar 20, 2025

Thanks for contributing to our kernel.

I appreciate following everything in the wiki for contribution guidelines.

I have a couple asks though:
Could you amend these commit headers to include the full 40 character SHA (there is clearly a limitation in our tool and I've created an issue to fix that)
wiki updated as well: https://github.com/ctrliq/kernel-src-tree/wiki#tldr

Do you have testing that you can publicly share that validates that thes changes do what is expected?

commit-author Jianbo Liu <[email protected]>
commit 4861333

Add NETIF_F_GSO_ESP bit to bond's gso_partial_features if all slaves
support it, such that ESP segmentation is handled by hardware if possible.

	Signed-off-by: Jianbo Liu <[email protected]>
	Reviewed-by: Boris Pismenny <[email protected]>
	Signed-off-by: Tariq Toukan <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 4861333)
	Signed-off-by: Srinivas Neginhal <[email protected]>
commit-author Cosmin Ratiu <[email protected]>
commit 9e6c4e6

The referenced fix is incomplete. It correctly computes
bond_dev->gso_partial_features across slaves, but unfortunately
netdev_fix_features discards gso_partial_features from the feature set
if NETIF_F_GSO_PARTIAL isn't set in bond_dev->features.

This is visible with ethtool -k bond0 | grep esp:
tx-esp-segmentation: off [requested on]
esp-hw-offload: on
esp-tx-csum-hw-offload: on

This patch reworks the bonding GSO offload support by:
- making aggregating gso_partial_features across slaves similar to the
  other feature sets (this part is a no-op).
- advertising the default partial gso features on empty bond devs, same
  as with other feature sets (also a no-op).
- adding NETIF_F_GSO_PARTIAL to hw_enc_features filtered across slaves.
- adding NETIF_F_GSO_PARTIAL to features in bond_setup()

With all of these, 'ethtool -k bond0 | grep esp' now reports:
tx-esp-segmentation: on
esp-hw-offload: on
esp-tx-csum-hw-offload: on

Fixes: 4861333 ("bonding: add ESP offload features when slaves support")
	Signed-off-by: Hangbin Liu <[email protected]>
	Signed-off-by: Cosmin Ratiu <[email protected]>
	Acked-by: Jay Vosburgh <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Paolo Abeni <[email protected]>

(cherry picked from commit 9e6c4e6)
	Signed-off-by: Srinivas Neginhal <[email protected]>
commit-author Cosmin Ratiu <[email protected]>
commit 0aae286

The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.

This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb->inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.

Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
  inner network header.
- then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
  network headers.
- later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
  xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
  network header with the one set in ip_tunnel_xmit before adding the
  second outer header.
- __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
  needs to happen based on dev features. In the original bug, the hw
  couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
  tries to use the wrong inner network header, expecting the one set in
  iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
  wrong inner network header.

With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.

	Reviewed-by: Dragos Tatulea <[email protected]>
	Reviewed-by: Yael Chemla <[email protected]>
	Reviewed-by: Leon Romanovsky <[email protected]>
Fixes: a204aef ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
	Signed-off-by: Cosmin Ratiu <[email protected]>
	Signed-off-by: Steffen Klassert <[email protected]>
(cherry picked from commit 0aae286)
	Signed-off-by: Srinivas Neginhal <[email protected]>
commit-author Jianbo Liu <[email protected]>
commit 373b79a

Previously xfrm_dev_state_advance_esn() was added for RX only. But
it's possible that ESN context also need to be synced to hardware for
TX, so call it for outbound in this patch.

	Signed-off-by: Jianbo Liu <[email protected]>
	Signed-off-by: Leon Romanovsky <[email protected]>
	Signed-off-by: Steffen Klassert <[email protected]>
(cherry picked from commit 373b79a)
	Signed-off-by: Srinivas Neginhal <[email protected]>
@sneginha sneginha force-pushed the sneginhal-fips-9-compliant/5.14.0-284.30.1 branch from 03b404b to 24b7dbb Compare March 20, 2025 21:51
@sneginha
Copy link
Author

Thanks for contributing to our kernel.

I appreciate following everything in the wiki for contribution guidelines.

I have a couple asks though: Could you amend these commit headers to include the full 40 character SHA (there is clearly a limitation in our tool and I've created an issue to fix that) wiki updated as well: https://github.com/ctrliq/kernel-src-tree/wiki#tldr

Done.

Do you have testing that you can publicly share that validates that thes changes do what is expected?

Testing done:

  1. Run iperf TCP performance tests
  2. Review packet captures on the tx side
  3. Ensure that you see large MSS sized packets going to the NIC.
  4. Also, note the increase in network performance.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank You for the update to the commits headers.

Do you have testing that you can publicly share that validates that thes changes do what is expected?

Testing done:
1. Run iperf TCP performance tests
2. Review packet captures on the tx side
3. Ensure that you see large MSS sized packets going to the NIC.
4. Also, note the increase in network performance.

And these do not impact when offloading is off, I'm assuming everything is fine so I'm going to send my approval.

Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🥌

@PlaidCat PlaidCat merged commit 636827e into ctrliq:fips-9-compliant/5.14.0-284.30.1 Mar 26, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants