Skip to content

Commit fc02cb2

Browse files
committed
Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2 parents bfc484f + 84882cf commit fc02cb2

File tree

2,296 files changed

+215137
-50034
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,296 files changed

+215137
-50034
lines changed
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
What: /sys/class/timecard/
2+
Date: September 2021
3+
Contact: Jonathan Lemon <[email protected]>
4+
Description: This directory contains files and directories
5+
providing a standardized interface to the ancillary
6+
features of the OpenCompute timecard.
7+
8+
What: /sys/class/timecard/ocpN/
9+
Date: September 2021
10+
Contact: Jonathan Lemon <[email protected]>
11+
Description: This directory contains the attributes of the Nth timecard
12+
registered.
13+
14+
What: /sys/class/timecard/ocpN/available_clock_sources
15+
Date: September 2021
16+
Contact: Jonathan Lemon <[email protected]>
17+
Description: (RO) The list of available time sources that the PHC
18+
uses for clock adjustments.
19+
20+
==== =================================================
21+
NONE no adjustments
22+
PPS adjustments come from the PPS1 selector (default)
23+
TOD adjustments from the GNSS/TOD module
24+
IRIG adjustments from external IRIG-B signal
25+
DCF adjustments from external DCF signal
26+
==== =================================================
27+
28+
What: /sys/class/timecard/ocpN/available_sma_inputs
29+
Date: September 2021
30+
Contact: Jonathan Lemon <[email protected]>
31+
Description: (RO) Set of available destinations (sinks) for a SMA
32+
input signal.
33+
34+
===== ================================================
35+
10Mhz signal is used as the 10Mhz reference clock
36+
PPS1 signal is sent to the PPS1 selector
37+
PPS2 signal is sent to the PPS2 selector
38+
TS1 signal is sent to timestamper 1
39+
TS2 signal is sent to timestamper 2
40+
IRIG signal is sent to the IRIG-B module
41+
DCF signal is sent to the DCF module
42+
===== ================================================
43+
44+
What: /sys/class/timecard/ocpN/available_sma_outputs
45+
Date: May 2021
46+
Contact: Jonathan Lemon <[email protected]>
47+
Description: (RO) Set of available sources for a SMA output signal.
48+
49+
===== ================================================
50+
10Mhz output is from the 10Mhz reference clock
51+
PHC output PPS is from the PHC clock
52+
MAC output PPS is from the Miniature Atomic Clock
53+
GNSS output PPS is from the GNSS module
54+
GNSS2 output PPS is from the second GNSS module
55+
IRIG output is from the PHC, in IRIG-B format
56+
DCF output is from the PHC, in DCF format
57+
===== ================================================
58+
59+
What: /sys/class/timecard/ocpN/clock_source
60+
Date: September 2021
61+
Contact: Jonathan Lemon <[email protected]>
62+
Description: (RW) Contains the current synchronization source used by
63+
the PHC. May be changed by writing one of the listed
64+
values from the available_clock_sources attribute set.
65+
66+
What: /sys/class/timecard/ocpN/gnss_sync
67+
Date: September 2021
68+
Contact: Jonathan Lemon <[email protected]>
69+
Description: (RO) Indicates whether a valid GNSS signal is received,
70+
or when the signal was lost.
71+
72+
What: /sys/class/timecard/ocpN/i2c
73+
Date: September 2021
74+
Contact: Jonathan Lemon <[email protected]>
75+
Description: This optional attribute links to the associated i2c device.
76+
77+
What: /sys/class/timecard/ocpN/irig_b_mode
78+
Date: September 2021
79+
Contact: Jonathan Lemon <[email protected]>
80+
Description: (RW) An integer from 0-7 indicating the timecode format
81+
of the IRIG-B output signal: B00<n>
82+
83+
What: /sys/class/timecard/ocpN/pps
84+
Date: September 2021
85+
Contact: Jonathan Lemon <[email protected]>
86+
Description: This optional attribute links to the associated PPS device.
87+
88+
What: /sys/class/timecard/ocpN/ptp
89+
Date: September 2021
90+
Contact: Jonathan Lemon <[email protected]>
91+
Description: This attribute links to the associated PTP device.
92+
93+
What: /sys/class/timecard/ocpN/serialnum
94+
Date: September 2021
95+
Contact: Jonathan Lemon <[email protected]>
96+
Description: (RO) Provides the serial number of the timecard.
97+
98+
What: /sys/class/timecard/ocpN/sma1
99+
What: /sys/class/timecard/ocpN/sma2
100+
What: /sys/class/timecard/ocpN/sma3
101+
What: /sys/class/timecard/ocpN/sma4
102+
Date: September 2021
103+
Contact: Jonathan Lemon <[email protected]>
104+
Description: (RW) These attributes specify the direction of the signal
105+
on the associated SMA connectors, and also the signal sink
106+
or source.
107+
108+
The display format of the attribute is a space separated
109+
list of signals, prefixed by the input/output direction.
110+
111+
The signal direction may be changed (if supported) by
112+
prefixing the signal list with either "in:" or "out:".
113+
If neither prefix is present, then the direction is unchanged.
114+
115+
The output signal may be changed by writing one of the listed
116+
values from the available_sma_outputs attribute set.
117+
118+
The input destinations may be changed by writing multiple
119+
values from the available_sma_inputs attribute set,
120+
separated by spaces. If there are duplicated input
121+
destinations between connectors, the lowest numbered SMA
122+
connector is given priority.
123+
124+
Note that not all input combinations may make sense.
125+
126+
The 10Mhz reference clock input is currently only valid
127+
on SMA1 and may not be combined with other destination sinks.
128+
129+
What: /sys/class/timecard/ocpN/ts_window_adjust
130+
Date: September 2021
131+
Contact: Jonathan Lemon <[email protected]>
132+
Description: (RW) When retrieving the PHC with the PTP SYS_OFFSET_EXTENDED
133+
ioctl, a system timestamp is made before and after the PHC
134+
time is retrieved. The midpoint between the two system
135+
timestamps is usually taken to be the SYS time associated
136+
with the PHC time. This estimate may be wrong, as it depends
137+
on PCI latencies, and when the PHC time was latched
138+
139+
The attribute value reduces the end timestamp by the given
140+
number of nanoseconds, so the computed midpoint matches the
141+
retrieved PHC time.
142+
143+
The initial value is set based on measured PCI latency and
144+
the estimated point where the FPGA latches the PHC time. This
145+
value may be changed by writing an unsigned integer.
146+
147+
What: /sys/class/timecard/ocpN/ttyGNSS
148+
What: /sys/class/timecard/ocpN/ttyGNSS2
149+
Date: September 2021
150+
Contact: Jonathan Lemon <[email protected]>
151+
Description: These optional attributes link to the TTY serial ports
152+
associated with the GNSS devices.
153+
154+
What: /sys/class/timecard/ocpN/ttyMAC
155+
Date: September 2021
156+
Contact: Jonathan Lemon <[email protected]>
157+
Description: This optional attribute links to the TTY serial port
158+
associated with the Miniature Atomic Clock.
159+
160+
What: /sys/class/timecard/ocpN/ttyNMEA
161+
Date: September 2021
162+
Contact: Jonathan Lemon <[email protected]>
163+
Description: This optional attribute links to the TTY serial port
164+
which outputs the PHC time in NMEA ZDA format.
165+
166+
What: /sys/class/timecard/ocpN/utc_tai_offset
167+
Date: September 2021
168+
Contact: Jonathan Lemon <[email protected]>
169+
Description: (RW) The DCF and IRIG output signals are in UTC, while the
170+
TimeCard operates on TAI. This attribute allows setting the
171+
offset in seconds, which is added to the TAI timebase for
172+
these formats.
173+
174+
The offset may be changed by writing an unsigned integer.

Documentation/bpf/bpf_licensing.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
=============
2+
BPF licensing
3+
=============
4+
5+
Background
6+
==========
7+
8+
* Classic BPF was BSD licensed
9+
10+
"BPF" was originally introduced as BSD Packet Filter in
11+
http://www.tcpdump.org/papers/bpf-usenix93.pdf. The corresponding instruction
12+
set and its implementation came from BSD with BSD license. That original
13+
instruction set is now known as "classic BPF".
14+
15+
However an instruction set is a specification for machine-language interaction,
16+
similar to a programming language. It is not a code. Therefore, the
17+
application of a BSD license may be misleading in a certain context, as the
18+
instruction set may enjoy no copyright protection.
19+
20+
* eBPF (extended BPF) instruction set continues to be BSD
21+
22+
In 2014, the classic BPF instruction set was significantly extended. We
23+
typically refer to this instruction set as eBPF to disambiguate it from cBPF.
24+
The eBPF instruction set is still BSD licensed.
25+
26+
Implementations of eBPF
27+
=======================
28+
29+
Using the eBPF instruction set requires implementing code in both kernel space
30+
and user space.
31+
32+
In Linux Kernel
33+
---------------
34+
35+
The reference implementations of the eBPF interpreter and various just-in-time
36+
compilers are part of Linux and are GPLv2 licensed. The implementation of
37+
eBPF helper functions is also GPLv2 licensed. Interpreters, JITs, helpers,
38+
and verifiers are called eBPF runtime.
39+
40+
In User Space
41+
-------------
42+
43+
There are also implementations of eBPF runtime (interpreter, JITs, helper
44+
functions) under
45+
Apache2 (https://github.com/iovisor/ubpf),
46+
MIT (https://github.com/qmonnet/rbpf), and
47+
BSD (https://github.com/DPDK/dpdk/blob/main/lib/librte_bpf).
48+
49+
In HW
50+
-----
51+
52+
The HW can choose to execute eBPF instruction natively and provide eBPF runtime
53+
in HW or via the use of implementing firmware with a proprietary license.
54+
55+
In other operating systems
56+
--------------------------
57+
58+
Other kernels or user space implementations of eBPF instruction set and runtime
59+
can have proprietary licenses.
60+
61+
Using BPF programs in the Linux kernel
62+
======================================
63+
64+
Linux Kernel (while being GPLv2) allows linking of proprietary kernel modules
65+
under these rules:
66+
Documentation/process/license-rules.rst
67+
68+
When a kernel module is loaded, the linux kernel checks which functions it
69+
intends to use. If any function is marked as "GPL only," the corresponding
70+
module or program has to have GPL compatible license.
71+
72+
Loading BPF program into the Linux kernel is similar to loading a kernel
73+
module. BPF is loaded at run time and not statically linked to the Linux
74+
kernel. BPF program loading follows the same license checking rules as kernel
75+
modules. BPF programs can be proprietary if they don't use "GPL only" BPF
76+
helper functions.
77+
78+
Further, some BPF program types - Linux Security Modules (LSM) and TCP
79+
Congestion Control (struct_ops), as of Aug 2021 - are required to be GPL
80+
compatible even if they don't use "GPL only" helper functions directly. The
81+
registration step of LSM and TCP congestion control modules of the Linux
82+
kernel is done through EXPORT_SYMBOL_GPL kernel functions. In that sense LSM
83+
and struct_ops BPF programs are implicitly calling "GPL only" functions.
84+
The same restriction applies to BPF programs that call kernel functions
85+
directly via unstable interface also known as "kfunc".
86+
87+
Packaging BPF programs with user space applications
88+
====================================================
89+
90+
Generally, proprietary-licensed applications and GPL licensed BPF programs
91+
written for the Linux kernel in the same package can co-exist because they are
92+
separate executable processes. This applies to both cBPF and eBPF programs.

Documentation/bpf/btf.rst

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ sequentially and type id is assigned to each recognized type starting from id
8585
#define BTF_KIND_VAR 14 /* Variable */
8686
#define BTF_KIND_DATASEC 15 /* Section */
8787
#define BTF_KIND_FLOAT 16 /* Floating point */
88+
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */
8889

8990
Note that the type section encodes debug info, not just pure types.
9091
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
@@ -106,7 +107,7 @@ Each type contains the following common data::
106107
* "size" tells the size of the type it is describing.
107108
*
108109
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
109-
* FUNC and FUNC_PROTO.
110+
* FUNC, FUNC_PROTO and DECL_TAG.
110111
* "type" is a type_id referring to another type.
111112
*/
112113
union {
@@ -465,6 +466,32 @@ map definition.
465466

466467
No additional type data follow ``btf_type``.
467468

469+
2.2.17 BTF_KIND_DECL_TAG
470+
~~~~~~~~~~~~~~~~~~~~~~~~
471+
472+
``struct btf_type`` encoding requirement:
473+
* ``name_off``: offset to a non-empty string
474+
* ``info.kind_flag``: 0
475+
* ``info.kind``: BTF_KIND_DECL_TAG
476+
* ``info.vlen``: 0
477+
* ``type``: ``struct``, ``union``, ``func``, ``var`` or ``typedef``
478+
479+
``btf_type`` is followed by ``struct btf_decl_tag``.::
480+
481+
struct btf_decl_tag {
482+
__u32 component_idx;
483+
};
484+
485+
The ``name_off`` encodes btf_decl_tag attribute string.
486+
The ``type`` should be ``struct``, ``union``, ``func``, ``var`` or ``typedef``.
487+
For ``var`` or ``typedef`` type, ``btf_decl_tag.component_idx`` must be ``-1``.
488+
For the other three types, if the btf_decl_tag attribute is
489+
applied to the ``struct``, ``union`` or ``func`` itself,
490+
``btf_decl_tag.component_idx`` must be ``-1``. Otherwise,
491+
the attribute is applied to a ``struct``/``union`` member or
492+
a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
493+
valid index (starting from 0) pointing to a member or an argument.
494+
468495
3. BTF Kernel API
469496
*****************
470497

Documentation/bpf/index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,15 @@ Testing and debugging BPF
8282
s390
8383

8484

85+
Licensing
86+
=========
87+
88+
.. toctree::
89+
:maxdepth: 1
90+
91+
bpf_licensing
92+
93+
8594
Other
8695
=====
8796

Documentation/bpf/libbpf/libbpf_naming_convention.rst

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,46 @@ mirror of the mainline's version of libbpf for a stand-alone build.
150150
However, all changes to libbpf's code base must be upstreamed through
151151
the mainline kernel tree.
152152

153+
154+
API documentation convention
155+
============================
156+
157+
The libbpf API is documented via comments above definitions in
158+
header files. These comments can be rendered by doxygen and sphinx
159+
for well organized html output. This section describes the
160+
convention in which these comments should be formated.
161+
162+
Here is an example from btf.h:
163+
164+
.. code-block:: c
165+
166+
/**
167+
* @brief **btf__new()** creates a new instance of a BTF object from the raw
168+
* bytes of an ELF's BTF section
169+
* @param data raw bytes
170+
* @param size number of bytes passed in `data`
171+
* @return new BTF object instance which has to be eventually freed with
172+
* **btf__free()**
173+
*
174+
* On error, error-code-encoded-as-pointer is returned, not a NULL. To extract
175+
* error code from such a pointer `libbpf_get_error()` should be used. If
176+
* `libbpf_set_strict_mode(LIBBPF_STRICT_CLEAN_PTRS)` is enabled, NULL is
177+
* returned on error instead. In both cases thread-local `errno` variable is
178+
* always set to error code as well.
179+
*/
180+
181+
The comment must start with a block comment of the form '/\*\*'.
182+
183+
The documentation always starts with a @brief directive. This line is a short
184+
description about this API. It starts with the name of the API, denoted in bold
185+
like so: **api_name**. Please include an open and close parenthesis if this is a
186+
function. Follow with the short description of the API. A longer form description
187+
can be added below the last directive, at the bottom of the comment.
188+
189+
Parameters are denoted with the @param directive, there should be one for each
190+
parameter. If this is a function with a non-void return, use the @return directive
191+
to document it.
192+
153193
License
154194
-------------------
155195

0 commit comments

Comments
 (0)