Skip to content

Conversation

shreeya-patel98
Copy link
Collaborator

Commit Message

    net: mana: Add debug logs in MANA network driver
    
    jira LE-3889
    commit-author Erni Sri Satya Vennela <[email protected]>
    commit 47dfd7a72257e91171d56e220ea484a04df89847
    
    Add more logs to assist in debugging and monitoring
    driver behaviour, making it easier to identify potential
    issues  during development and testing.
    
-----------------------------------------------------------------------------
    net: mana: Allow tso_max_size to go up-to GSO_MAX_SIZE
    
    jira LE-3885
    commit-author Shradha Gupta <[email protected]>
    commit 27315836f4bcc8e4879d50dfc1fa6eb41e7952ef
    
    Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC.
    This patch only increases the max allowable gso/gro pkt size for MANA
    devices and does not change the defaults.
    Following are the perf benefits by increasing the pkt aggregate size from
    legacy gso_max_size value(64K) to newer one(up-to 511K
    
    IPv4 tests
    for i in {1..10}; do netperf -t TCP_RR  -H 10.0.0.5 -p50000 -- -r80000,80000
    -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
    
    min     p90     p99     Throughput              gso_max_size
    93      171     194     6594.25
    97      154     180     7183.74
    95      165     189     6927.86
    96      165     188     6976.04
    93      154     185     7338.05                 64K
    93      168     189     6938.03
    94      169     189     6784.93
    92      166     189     7117.56
    94      179     191     6678.44
    95      157     183     7277.81
    
    min     p90     p99     Throughput
    93      134     146     8448.75
    95      134     140     8396.54
    94      137     148     8204.12
    94      137     148     8244.41
    94      128     139     8666.52                 80K
    94      141     153     8116.86
    94      138     149     8163.92
    92      135     142     8362.72
    92      134     142     8497.57
    93      136     148     8393.23
    
    IPv6 Tests
    for i in {1..10}; do netperf -t TCP_RR  -H fd00:9013:cadd::4 -p50000 --
    -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
    
    min     p90     p99     Throughput              gso_max_size
    108     165     170     6673.2
    101     169     189     6451.69
    101     165     169     6737.65
    102     167     175     6614.64
    101     178     189     6247.13                 64K
    107     163     169     6678.63
    106     176     187     6350.86
    100     164     169     6617.36
    102     163     170     6849.21
    102     168     175     6605.7
    
    min     p90     p99     Throughput
    108     155     166     7183
    110     154     163     7268.87
    109     152     159     7434.35
    107     145     157     7569.15
    107     149     164     7496.17                 80K
    110     154     159     7245.85
    108     156     162     7266.24
    109     145     158     7526.66
    106     145     151     7785.75
    111     148     157     7246.65
    
    Tested on azure env with Accelerated Networking enabled and disabled.
    
-----------------------------------------------------------------------------

    hv_netvsc: Use VF's tso_max_size value when data path is VF
    
    jira LE-3885
    commit-author Shradha Gupta <[email protected]>
    commit 685920920e3d5f68a8c50107b97747b0f8ce050f
    
    On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE
    is not possible without allowing the same for netvsc NIC
    (as the NICs are bonded together). For bonded NICs, the min of the max
    aggregated pkt size of the members is propagated in the stack.
    
    Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size
    to VF's packet size for netvsc too, when the data path is switched over
    to the VF
    Tested on azure env with Accelerated Networking enabled and disabled.
    

Kernel Build

/home/rocky/workspace/kernel-src-tree
Skipping make mrproper
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-HEAD-901c28ff4d62"
Making olddefconfig
#
# configuration written to .config
#
Starting Build
  SYNC    include/config/auto.conf.cmd
mkdir -p /home/rocky/workspace/kernel-src-tree/tools/objtool && make O=/home/rocky/workspace/kernel-src-tree subdir=tools/objtool --no-print-directory -C objtool 
mkdir -p /home/rocky/workspace/kernel-src-tree/tools/bpf/resolve_btfids && make O=/home/rocky/workspace/kernel-src-tree subdir=tools/bpf/resolve_btfids --no-print-directory -C bpf/resolve_btfids 
  INSTALL libsubcmd_headers
  CALL    scripts/atomic/check-atomics.sh
warning: generated include/linux/atomic/atomic-instrumented.h has been modified.
  CALL    scripts/checksyscalls.sh
  CHK     include/generated/compile.h
  CHK     kernel/kheaders_data.tar.xz
  TEST    posttest
arch/x86/tools/insn_decoder_test: success: Decoded and checked 6993310 instructions
  TEST    posttest
arch/x86/tools/insn_sanity: Success: decoded and checked 1000000 random instructions with 0 errors (seed:0xc8873efd)
Kernel: arch/x86/boot/bzImage is ready  (#1)
[TIMER]{BUILD}: 34s
Making Modules
  INSTALL /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/arch/x86/crypto/camellia-aesni-avx2.ko
  INSTALL /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/
  <--snip-->  STRIP   /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  STRIP   /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/usb/usx2y/snd-usb-usx2y.ko
  SIGN    /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/usb/snd-usb-audio.ko
  SIGN    /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/x86/snd-hdmi-lpe-audio.ko
  SIGN    /lib/modules/5.14.0-HEAD-901c28ff4d62+/kernel/sound/xen/snd_xen_front.ko
  DEPMOD  /lib/modules/5.14.0-HEAD-901c28ff4d62+
[TIMER]{MODULES}: 8s
Making Install
sh ./arch/x86/boot/install.sh 5.14.0-HEAD-901c28ff4d62+ \
	arch/x86/boot/bzImage System.map "/boot"
[TIMER]{INSTALL}: 25s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-HEAD-c771763a2b69+ and Index to 3
The default is /boot/loader/entries/ae477367830943118aa3354df0e829b2-5.14.0-HEAD-c771763a2b69+.conf with index 3 and kernel /boot/vmlinuz-5.14.0-HEAD-c771763a2b69+
The default is /boot/loader/entries/ae477367830943118aa3354df0e829b2-5.14.0-HEAD-c771763a2b69+.conf with index 3 and kernel /boot/vmlinuz-5.14.0-HEAD-c771763a2b69+
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 34s
[TIMER]{MODULES}: 8s
[TIMER]{INSTALL}: 25s
[TIMER]{TOTAL} 69s
Rebooting in 10 seconds

kernel-build.log

Kselftest

[rocky@shreeya-scn9 workspace]$ grep '^ok ' kselftest-before.log | wc -l && grep '^ok ' kselftest-after.log | wc -l
345
347
[rocky@shreeya-scn9 workspace]$ grep '^not ok ' kselftest-before.log | wc -l && grep '^not ok ' kselftest-after.log | wc -l
82
80

kselftest-after.log
kselftest-before.log

Testing

[rocky@shreeya-scn9 kernel-src-tree]$ ip link show eth1
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000
    link/ether 60:45:bd:ef:7a:84 brd ff:ff:ff:ff:ff:ff
[rocky@shreeya-scn9 kernel-src-tree]$ ethtool -i eth1
driver: mana
version: 5.14.0-HEAD-901c28ff4d62+
firmware-version: 
expansion-rom-version: 
bus-info: 7870:00:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

[rocky@shreeya-scn9 kernel-src-tree]$ netserver -p 50001
Starting netserver with host 'IN(6)ADDR_ANY' port '50001' and family AF_UNSPEC

[rocky@shreeya-scn9 kernel-src-tree]$ for i in {1..10}; do 
    netperf -t TCP_RR -H 10.3.0.10 -p 50001 -- -r 80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT | tail -1
done
36           46           58           22672.31   
28           45           52           23121.50   
28           46           59           22866.10   
37           46           60           22802.62   
36           46           52           22803.10   
37           46           60           22545.01   
35           45           58           23082.60   
28           46           58           22936.80   
28           47           52           22923.53   
25           46           57           22962.27   


[rocky@shreeya-scn9 kernel-src-tree]$ for i in {1..10}; do 
    netperf -6 -t TCP_RR -H fe80::6245:bdff:feef:7a84%eth0 -p50001 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT | tail -1
done
35           45           54           23515.02   
26           44           51           23602.11   
20           45           56           23203.30   
34           45           56           23348.56   
37           45           52           23480.82   
36           45           58           23385.54   
34           45           51           23488.41   
34           45           56           23447.47   
34           45           56           23033.09   
37           45           58           23121.23   

[rocky@shreeya-scn9 ~]$ sudo dmesg | grep mana
[    5.073569] mana 7870:00:00.0: enabling device (0000 -> 0002)
[    5.089473] mana 7870:00:00.0: Microsoft Azure Network Adapter protocol version: 0.1.1
[    5.106577] mana 7870:00:00.0 eth1: joined to eth0
[    5.847790] mana 7870:00:00.0 eth1: Configured vPort 0 PD 18 DB 16
[    5.855039] mana 7870:00:00.0 eth1: Configured steering vPort 0 entries 64
[    5.957674] mana 7870:00:00.0 eth1: Configured steering vPort 0 entries 64
[    6.124725] mana 7870:00:00.0 eth1: Configured vPort 0 PD 18 DB 16
[    6.132840] mana 7870:00:00.0 eth1: Configured steering vPort 0 entries 64
[14451.747342] mana 7870:00:00.0 eth1: Configured steering vPort 0 entries 64
[14844.052460] mana 7870:00:00.0: Microsoft Azure Network Adapter protocol version: 0.1.1
[14844.060744] mana 7870:00:00.0 eth1: joined to eth0
[14844.162331] mana 7870:00:00.0 eth1: Configured vPort 0 PD 18 DB 16
[14844.170880] mana 7870:00:00.0 eth1: Configured steering vPort 0 entries 64

[rocky@shreeya-scn9 ~]$ lspci
7870:00:00.0 Ethernet controller: Microsoft Corporation Device 00ba
c05b:00:00.0 Non-Volatile memory controller: Microsoft Corporation Device 00a9
db04:00:00.0 Non-Volatile memory controller: Microsoft Corporation Device b111 (rev 01)

[rocky@shreeya-scn9 ~]$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 60:45:bd:ef:7a:84 brd ff:ff:ff:ff:ff:ff
    alias Network Device
4: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000
    link/ether 60:45:bd:ef:7a:84 brd ff:ff:ff:ff:ff:ff
    
[rocky@shreeya-scn9 ~]$ ethtool -S eth1 | grep -E "^[ \t]+vf"
[rocky@shreeya-scn9 ~]$ ethtool -S eth0 | grep -E "^[ \t]+vf"
     vf_rx_packets: 46229
     vf_rx_bytes: 11081944
     vf_tx_packets: 59606
     vf_tx_bytes: 14933254
     vf_tx_dropped: 0
     
[rocky@shreeya-scn9 ~]$ sudo rmmod mana_ib
[rocky@shreeya-scn9 ~]$ sudo rmmod mana

[rocky@shreeya-scn9 ~]$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 60:45:bd:ef:7a:84 brd ff:ff:ff:ff:ff:ff
    alias Network Device
    

jira LE-3885
commit-author Shradha Gupta <[email protected]>
commit 6859209

On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE
is not possible without allowing the same for netvsc NIC
(as the NICs are bonded together). For bonded NICs, the min of the max
aggregated pkt size of the members is propagated in the stack.

Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size
to VF's packet size for netvsc too, when the data path is switched over
to the VF
Tested on azure env with Accelerated Networking enabled and disabled.

	Signed-off-by: Shradha Gupta <[email protected]>
	Reviewed-by: Haiyang Zhang <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 6859209)
	Signed-off-by: Shreeya Patel <[email protected]>
jira LE-3885
commit-author Shradha Gupta <[email protected]>
commit 2731583

Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC.
This patch only increases the max allowable gso/gro pkt size for MANA
devices and does not change the defaults.
Following are the perf benefits by increasing the pkt aggregate size from
legacy gso_max_size value(64K) to newer one(up-to 511K

IPv4 tests
for i in {1..10}; do netperf -t TCP_RR  -H 10.0.0.5 -p50000 -- -r80000,80000
-O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done

min	p90	p99	Throughput		gso_max_size
93	171	194	6594.25
97	154	180	7183.74
95	165	189	6927.86
96	165	188	6976.04
93	154	185	7338.05			64K
93	168	189	6938.03
94	169	189	6784.93
92	166	189	7117.56
94	179	191	6678.44
95	157	183	7277.81

min	p90	p99	Throughput
93	134	146	8448.75
95	134	140	8396.54
94	137	148	8204.12
94	137	148	8244.41
94	128	139	8666.52			80K
94	141	153	8116.86
94	138	149	8163.92
92	135	142	8362.72
92	134	142	8497.57
93	136	148	8393.23

IPv6 Tests
for i in {1..10}; do netperf -t TCP_RR  -H fd00:9013:cadd::4 -p50000 --
-r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done

min	p90	p99	Throughput		gso_max_size
108	165	170	6673.2
101	169	189	6451.69
101	165	169	6737.65
102	167	175	6614.64
101	178	189	6247.13			64K
107	163	169	6678.63
106	176	187	6350.86
100	164	169	6617.36
102	163	170	6849.21
102	168	175	6605.7

min	p90	p99	Throughput
108	155	166	7183
110	154	163	7268.87
109	152	159	7434.35
107	145	157	7569.15
107	149	164	7496.17			80K
110	154	159	7245.85
108	156	162	7266.24
109	145	158	7526.66
106	145	151	7785.75
111	148	157	7246.65

Tested on azure env with Accelerated Networking enabled and disabled.

	Signed-off-by: Shradha Gupta <[email protected]>
	Reviewed-by: Haiyang Zhang <[email protected]>
	Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 2731583)
	Signed-off-by: Shreeya Patel <[email protected]>
jira LE-3889
commit-author Erni Sri Satya Vennela <[email protected]>
commit 47dfd7a

Add more logs to assist in debugging and monitoring
driver behaviour, making it easier to identify potential
issues  during development and testing.

	Signed-off-by: Erni Sri Satya Vennela <[email protected]>
	Reviewed-by: Haiyang Zhang <[email protected]>
Link: https://patch.msgid.link/[email protected]
	Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 47dfd7a)
	Signed-off-by: Shreeya Patel <[email protected]>
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥌

Copy link

@jdieter jdieter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shreeya-patel98 shreeya-patel98 merged commit ecf040f into sig-cloud-9/5.14.0-570.33.2.el9_6 Aug 30, 2025
4 checks passed
@shreeya-patel98 shreeya-patel98 deleted the {shreeyap}_sig-cloud-9/5.14.0-570.33.2.el9_6 branch August 30, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants