Skip to content

Commit 6e98b09

Browse files
committed
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core: - Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the default value allows for better BIG TCP performances - Reduce compound page head access for zero-copy data transfers - RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible - Threaded NAPI improvements, adding defer skb free support and unneeded softirq avoidance - Address dst_entry reference count scalability issues, via false sharing avoidance and optimize refcount tracking - Add lockless accesses annotation to sk_err[_soft] - Optimize again the skb struct layout - Extends the skb drop reasons to make it usable by multiple subsystems - Better const qualifier awareness for socket casts BPF: - Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses - Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward - Add more precise memory usage reporting for all BPF map types - Adds support for using {FOU,GUE} encap with an ipip device operating in collect_md mode and add a set of BPF kfuncs for controlling encap params - Allow BPF programs to detect at load time whether a particular kfunc exists or not, and also add support for this in light skeleton - Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities - Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc - Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps - Enable RCU semantics for task BPF kptrs and allow referenced kptr tasks to be stored in BPF maps - Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree - Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them - Add ARM32 USDT support to libbpf - Improve bpftool's visual program dump which produces the control flow graph in a DOT format by adding C source inline annotations Protocols: - IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value indicates the provenance of the IP address - IPv6: optimize route lookup, dropping unneeded R/W lock acquisition - Add the handshake upcall mechanism, allowing the user-space to implement generic TLS handshake on kernel's behalf - Bridge: support per-{Port, VLAN} neighbor suppression, increasing resilience to nodes failures - SCTP: add support for Fair Capacity and Weighted Fair Queueing schedulers - MPTCP: delay first subflow allocation up to its first usage. This will allow for later better LSM interaction - xfrm: Remove inner/outer modes from input/output path. These are not needed anymore - WiFi: - reduced neighbor report (RNR) handling for AP mode - HW timestamping support - support for randomized auth/deauth TA for PASN privacy - per-link debugfs for multi-link - TC offload support for mac80211 drivers - mac80211 mesh fast-xmit and fast-rx support - enable Wi-Fi 7 (EHT) mesh support Netfilter: - Add nf_tables 'brouting' support, to force a packet to be routed instead of being bridged - Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header. This is needed for BIT TCP support - The iptables 32bit compat interface isn't compiled in by default anymore - Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used - Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device Driver API: - Remove redundant Device Control Error Reporting Enable, as PCI core has already error reporting enabled at enumeration time - Move Multicast DB netlink handlers to core, allowing devices other then bridge to use them - Allow the page_pool to directly recycle the pages from safely localized NAPI - Implement lockless TX queue stop/wake combo macros, allowing for further code de-duplication and sanitization - Add YNL support for user headers and struct attrs - Add partial YNL specification for devlink - Add partial YNL specification for ethtool - Add tc-mqprio and tc-taprio support for preemptible traffic classes - Add tx push buf len param to ethtool, specifies the maximum number of bytes of a transmitted packet a driver can push directly to the underlying device - Add basic LED support for switch/phy - Add NAPI documentation, stop relaying on external links - Convert dsa_master_ioctl() to netdev notifier. This is a preparatory work to make the hardware timestamping layer selectable by user space - Add transceiver support and improve the error messages for CAN-FD controllers New hardware / drivers: - Ethernet: - AMD/Pensando core device support - MediaTek MT7981 SoC - MediaTek MT7988 SoC - Broadcom BCM53134 embedded switch - Texas Instruments CPSW9G ethernet switch - Qualcomm EMAC3 DWMAC ethernet - StarFive JH7110 SoC - NXP CBTX ethernet PHY - WiFi: - Apple M1 Pro/Max devices - RealTek rtl8710bu/rtl8188gu - RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset - Bluetooth: - Realtek RTL8821CS, RTL8851B, RTL8852BS - Mediatek MT7663, MT7922 - NXP w8997 - Actions Semi ATS2851 - QTI WCN6855 - Marvell 88W8997 - Can: - STMicroelectronics bxcan stm32f429 Drivers: - Ethernet NICs: - Intel (1G, icg): - add tracking and reporting of QBV config errors - add support for configuring max SDU for each Tx queue - Intel (100G, ice): - refactor mailbox overflow detection to support Scalable IOV - GNSS interface optimization - Intel (i40e): - support XDP multi-buffer - nVidia/Mellanox: - add the support for linux bridge multicast offload - enable TC offload for egress and engress MACVLAN over bond - add support for VxLAN GBP encap/decap flows offload - extend packet offload to fully support libreswan - support tunnel mode in mlx5 IPsec packet offload - extend XDP multi-buffer support - support MACsec VLAN offload - add support for dynamic msix vectors allocation - drop RX page_cache and fully use page_pool - implement thermal zone to report NIC temperature - Netronome/Corigine: - add support for multi-zone conntrack offload - Solarflare/Xilinx: - support offloading TC VLAN push/pop actions to the MAE - support TC decap rules - support unicast PTP - Other NICs: - Broadcom (bnxt): enforce software based freq adjustments only on shared PHC NIC - RealTek (r8169): refactor to addess ASPM issues during NAPI poll - Micrel (lan8841): add support for PTP_PF_PEROUT - Cadence (macb): enable PTP unicast - Engleder (tsnep): add XDP socket zero-copy support - virtio-net: implement exact header length guest feature - veth: add page_pool support for page recycling - vxlan: add MDB data path support - gve: add XDP support for GQI-QPL format - geneve: accept every ethertype - macvlan: allow some packets to bypass broadcast queue - mana: add support for jumbo frame - Ethernet high-speed switches: - Microchip (sparx5): Add support for TC flower templates - Ethernet embedded switches: - Broadcom (b54): - configure 6318 and 63268 RGMII ports - Marvell (mv88e6xxx): - faster C45 bus scan - Microchip: - lan966x: - add support for IS1 VCAP - better TX/RX from/to CPU performances - ksz9477: add ETS Qdisc support - ksz8: enhance static MAC table operations and error handling - sama7g5: add PTP capability - NXP (ocelot): - add support for external ports - add support for preemptible traffic classes - Texas Instruments: - add CPSWxG SGMII support for J7200 and J721E - Intel WiFi (iwlwifi): - preparation for Wi-Fi 7 EHT and multi-link support - EHT (Wi-Fi 7) sniffer support - hardware timestamping support for some devices/firwmares - TX beacon protection on newer hardware - Qualcomm 802.11ax WiFi (ath11k): - MU-MIMO parameters support - ack signal support for management packets - RealTek WiFi (rtw88): - SDIO bus support - better support for some SDIO devices (e.g. MAC address from efuse) - RealTek WiFi (rtw89): - HW scan support for 8852b - better support for 6 GHz scanning - support for various newer firmware APIs - framework firmware backwards compatibility - MediaTek WiFi (mt76): - P2P support - mesh A-MSDU support - EHT (Wi-Fi 7) support - coredump support" * tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits) net: phy: hide the PHYLIB_LEDS knob net: phy: marvell-88x2222: remove unnecessary (void*) conversions tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed net: phy: marvell: Fix inconsistent indenting in led_blink_set lan966x: Don't use xdp_frame when action is XDP_TX tsnep: Add XDP socket zero-copy TX support tsnep: Add XDP socket zero-copy RX support tsnep: Move skb receive action to separate function tsnep: Add functions for queue enable/disable tsnep: Rework TX/RX queue initialization tsnep: Replace modulo operation with mask net: phy: dp83867: Add led_brightness_set support net: phy: Fix reading LED reg property drivers: nfc: nfcsim: remove return value check of `dev_dir` net: phy: dp83867: Remove unnecessary (void*) conversions net: ethtool: coalesce: try to make user settings stick twice net: mana: Check if netdev/napi_alloc_frag returns single page net: mana: Rename mana_refill_rxoob and remove some empty lines net: veth: add page_pool stats ...
2 parents b68ee1c + 9b78d91 commit 6e98b09

File tree

1,925 files changed

+138910
-47345
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,925 files changed

+138910
-47345
lines changed

Documentation/PCI/pci-error-recovery.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -418,7 +418,6 @@ That is, the recovery API only requires that:
418418
- drivers/next/e100.c
419419
- drivers/net/e1000
420420
- drivers/net/e1000e
421-
- drivers/net/ixgb
422421
- drivers/net/ixgbe
423422
- drivers/net/cxgb3
424423
- drivers/net/s2io.c

Documentation/bpf/bpf_design_QA.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ Q: What is the compatibility story for special BPF types in map values?
314314
Q: Users are allowed to embed bpf_spin_lock, bpf_timer fields in their BPF map
315315
values (when using BTF support for BPF maps). This allows to use helpers for
316316
such objects on these fields inside map values. Users are also allowed to embed
317-
pointers to some kernel types (with __kptr and __kptr_ref BTF tags). Will the
317+
pointers to some kernel types (with __kptr_untrusted and __kptr BTF tags). Will the
318318
kernel preserve backwards compatibility for these features?
319319

320320
A: It depends. For bpf_spin_lock, bpf_timer: YES, for kptr and everything else:
@@ -324,7 +324,7 @@ For struct types that have been added already, like bpf_spin_lock and bpf_timer,
324324
the kernel will preserve backwards compatibility, as they are part of UAPI.
325325

326326
For kptrs, they are also part of UAPI, but only with respect to the kptr
327-
mechanism. The types that you can use with a __kptr and __kptr_ref tagged
327+
mechanism. The types that you can use with a __kptr_untrusted and __kptr tagged
328328
pointer in your struct are NOT part of the UAPI contract. The supported types can
329329
and will change across kernel releases. However, operations like accessing kptr
330330
fields and bpf_kptr_xchg() helper will continue to be supported across kernel

Documentation/bpf/bpf_devel_QA.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,8 @@ into the bpf-next tree will make their way into net-next tree. net and
128128
net-next are both run by David S. Miller. From there, they will go
129129
into the kernel mainline tree run by Linus Torvalds. To read up on the
130130
process of net and net-next being merged into the mainline tree, see
131-
the :ref:`netdev-FAQ`
131+
the documentation on netdev subsystem at
132+
Documentation/process/maintainer-netdev.rst.
132133

133134

134135

@@ -147,7 +148,8 @@ request)::
147148
Q: How do I indicate which tree (bpf vs. bpf-next) my patch should be applied to?
148149
---------------------------------------------------------------------------------
149150

150-
A: The process is the very same as described in the :ref:`netdev-FAQ`,
151+
A: The process is the very same as described in the netdev subsystem
152+
documentation at Documentation/process/maintainer-netdev.rst,
151153
so please read up on it. The subject line must indicate whether the
152154
patch is a fix or rather "next-like" content in order to let the
153155
maintainers know whether it is targeted at bpf or bpf-next.
@@ -206,8 +208,9 @@ ii) run extensive BPF test suite and
206208
Once the BPF pull request was accepted by David S. Miller, then
207209
the patches end up in net or net-next tree, respectively, and
208210
make their way from there further into mainline. Again, see the
209-
:ref:`netdev-FAQ` for additional information e.g. on how often they are
210-
merged to mainline.
211+
documentation for netdev subsystem at
212+
Documentation/process/maintainer-netdev.rst for additional information
213+
e.g. on how often they are merged to mainline.
211214

212215
Q: How long do I need to wait for feedback on my BPF patches?
213216
-------------------------------------------------------------
@@ -230,7 +233,8 @@ Q: Are patches applied to bpf-next when the merge window is open?
230233
-----------------------------------------------------------------
231234
A: For the time when the merge window is open, bpf-next will not be
232235
processed. This is roughly analogous to net-next patch processing,
233-
so feel free to read up on the :ref:`netdev-FAQ` about further details.
236+
so feel free to read up on the netdev docs at
237+
Documentation/process/maintainer-netdev.rst about further details.
234238

235239
During those two weeks of merge window, we might ask you to resend
236240
your patch series once bpf-next is open again. Once Linus released
@@ -394,7 +398,8 @@ netdev kernel mailing list in Cc and ask for the fix to be queued up:
394398
395399

396400
The process in general is the same as on netdev itself, see also the
397-
:ref:`netdev-FAQ`.
401+
the documentation on networking subsystem at
402+
Documentation/process/maintainer-netdev.rst.
398403

399404
Q: Do you also backport to kernels not currently maintained as stable?
400405
----------------------------------------------------------------------
@@ -410,7 +415,7 @@ Q: The BPF patch I am about to submit needs to go to stable as well
410415
What should I do?
411416

412417
A: The same rules apply as with netdev patch submissions in general, see
413-
the :ref:`netdev-FAQ`.
418+
the netdev docs at Documentation/process/maintainer-netdev.rst.
414419

415420
Never add "``Cc: [email protected]``" to the patch description, but
416421
ask the BPF maintainers to queue the patches instead. This can be done
@@ -684,7 +689,6 @@ when:
684689

685690

686691
.. Links
687-
.. _netdev-FAQ: Documentation/process/maintainer-netdev.rst
688692
.. _selftests:
689693
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/
690694

Documentation/bpf/clang-notes.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ Arithmetic instructions
2020
For CPU versions prior to 3, Clang v7.0 and later can enable ``BPF_ALU`` support with
2121
``-Xclang -target-feature -Xclang +alu32``. In CPU version 3, support is automatically included.
2222

23+
Jump instructions
24+
=================
25+
26+
If ``-O0`` is used, Clang will generate the ``BPF_CALL | BPF_X | BPF_JMP`` (0x8d)
27+
instruction, which is not supported by the Linux kernel verifier.
28+
2329
Atomic operations
2430
=================
2531

Documentation/bpf/cpumasks.rst

Lines changed: 12 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ For example:
5151
.. code-block:: c
5252
5353
struct cpumask_map_value {
54-
struct bpf_cpumask __kptr_ref * cpumask;
54+
struct bpf_cpumask __kptr * cpumask;
5555
};
5656
5757
struct array_map {
@@ -117,18 +117,13 @@ For example:
117117
As mentioned and illustrated above, these ``struct bpf_cpumask *`` objects can
118118
also be stored in a map and used as kptrs. If a ``struct bpf_cpumask *`` is in
119119
a map, the reference can be removed from the map with bpf_kptr_xchg(), or
120-
opportunistically acquired with bpf_cpumask_kptr_get():
121-
122-
.. kernel-doc:: kernel/bpf/cpumask.c
123-
:identifiers: bpf_cpumask_kptr_get
124-
125-
Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
120+
opportunistically acquired using RCU:
126121

127122
.. code-block:: c
128123
129124
/* struct containing the struct bpf_cpumask kptr which is stored in the map. */
130125
struct cpumasks_kfunc_map_value {
131-
struct bpf_cpumask __kptr_ref * bpf_cpumask;
126+
struct bpf_cpumask __kptr * bpf_cpumask;
132127
};
133128
134129
/* The map containing struct cpumasks_kfunc_map_value entries. */
@@ -144,7 +139,7 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
144139
/**
145140
* A simple example tracepoint program showing how a
146141
* struct bpf_cpumask * kptr that is stored in a map can
147-
* be acquired using the bpf_cpumask_kptr_get() kfunc.
142+
* be passed to kfuncs using RCU protection.
148143
*/
149144
SEC("tp_btf/cgroup_mkdir")
150145
int BPF_PROG(cgrp_ancestor_example, struct cgroup *cgrp, const char *path)
@@ -158,26 +153,21 @@ Here is an example of a ``struct bpf_cpumask *`` being retrieved from a map:
158153
if (!v)
159154
return -ENOENT;
160155
156+
bpf_rcu_read_lock();
161157
/* Acquire a reference to the bpf_cpumask * kptr that's already stored in the map. */
162-
kptr = bpf_cpumask_kptr_get(&v->cpumask);
163-
if (!kptr)
158+
kptr = v->cpumask;
159+
if (!kptr) {
164160
/* If no bpf_cpumask was present in the map, it's because
165161
* we're racing with another CPU that removed it with
166162
* bpf_kptr_xchg() between the bpf_map_lookup_elem()
167-
* above, and our call to bpf_cpumask_kptr_get().
168-
* bpf_cpumask_kptr_get() internally safely handles this
169-
* race, and will return NULL if the cpumask is no longer
170-
* present in the map by the time we invoke the kfunc.
163+
* above, and our load of the pointer from the map.
171164
*/
165+
bpf_rcu_read_unlock();
172166
return -EBUSY;
167+
}
173168
174-
/* Free the reference we just took above. Note that the
175-
* original struct bpf_cpumask * kptr is still in the map. It will
176-
* be freed either at a later time if another context deletes
177-
* it from the map, or automatically by the BPF subsystem if
178-
* it's still present when the map is destroyed.
179-
*/
180-
bpf_cpumask_release(kptr);
169+
bpf_cpumask_setall(kptr);
170+
bpf_rcu_read_unlock();
181171
182172
return 0;
183173
}

0 commit comments

Comments
 (0)