Skip to content

Commit 18285ac

Browse files
committed
fwctl: Add documentation
Document the purpose and rules for the fwctl subsystem. Link in kdocs to the doc tree. Link: https://patch.msgid.link/r/[email protected] Nacked-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Daniel Vetter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Bagas Sanjaya <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
1 parent 840cfb7 commit 18285ac

File tree

4 files changed

+298
-0
lines changed

4 files changed

+298
-0
lines changed
Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===============
4+
fwctl subsystem
5+
===============
6+
7+
:Author: Jason Gunthorpe
8+
9+
Overview
10+
========
11+
12+
Modern devices contain extensive amounts of FW, and in many cases, are largely
13+
software-defined pieces of hardware. The evolution of this approach is largely a
14+
reaction to Moore's Law where a chip tape out is now highly expensive, and the
15+
chip design is extremely large. Replacing fixed HW logic with a flexible and
16+
tightly coupled FW/HW combination is an effective risk mitigation against chip
17+
respin. Problems in the HW design can be counteracted in device FW. This is
18+
especially true for devices which present a stable and backwards compatible
19+
interface to the operating system driver (such as NVMe).
20+
21+
The FW layer in devices has grown to incredible size and devices frequently
22+
integrate clusters of fast processors to run it. For example, mlx5 devices have
23+
over 30MB of FW code, and big configurations operate with over 1GB of FW managed
24+
runtime state.
25+
26+
The availability of such a flexible layer has created quite a variety in the
27+
industry where single pieces of silicon are now configurable software-defined
28+
devices and can operate in substantially different ways depending on the need.
29+
Further, we often see cases where specific sites wish to operate devices in ways
30+
that are highly specialized and require applications that have been tailored to
31+
their unique configuration.
32+
33+
Further, devices have become multi-functional and integrated to the point they
34+
no longer fit neatly into the kernel's division of subsystems. Modern
35+
multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
36+
subsystems while sharing the underlying hardware using the auxiliary device
37+
system.
38+
39+
All together this creates a challenge for the operating system, where devices
40+
have an expansive FW environment that needs robust device-specific debugging
41+
support, and FW-driven functionality that is not well suited to “generic”
42+
interfaces. fwctl seeks to allow access to the full device functionality from
43+
user space in the areas of debuggability, management, and first-boot/nth-boot
44+
provisioning.
45+
46+
fwctl is aimed at the common device design pattern where the OS and FW
47+
communicate via an RPC message layer constructed with a queue or mailbox scheme.
48+
In this case the driver will typically have some layer to deliver RPC messages
49+
and collect RPC responses from device FW. The in-kernel subsystem drivers that
50+
operate the device for its primary purposes will use these RPCs to build their
51+
drivers, but devices also usually have a set of ancillary RPCs that don't really
52+
fit into any specific subsystem. For example, a HW RAID controller is primarily
53+
operated by the block layer but also comes with a set of RPCs to administer the
54+
construction of drives within the HW RAID.
55+
56+
In the past when devices were more single function, individual subsystems would
57+
grow different approaches to solving some of these common problems. For instance
58+
monitoring device health, manipulating its FLASH, debugging the FW,
59+
provisioning, all have various unique interfaces across the kernel.
60+
61+
fwctl's purpose is to define a common set of limited rules, described below,
62+
that allow user space to securely construct and execute RPCs inside device FW.
63+
The rules serve as an agreement between the operating system and FW on how to
64+
correctly design the RPC interface. As a uAPI the subsystem provides a thin
65+
layer of discovery and a generic uAPI to deliver the RPCs and collect the
66+
response. It supports a system of user space libraries and tools which will
67+
use this interface to control the device using the device native protocols.
68+
69+
Scope of Action
70+
---------------
71+
72+
fwctl drivers are strictly restricted to being a way to operate the device FW.
73+
It is not an avenue to access random kernel internals, or other operating system
74+
SW states.
75+
76+
fwctl instances must operate on a well-defined device function, and the device
77+
should have a well-defined security model for what scope within the physical
78+
device the function is permitted to access. For instance, the most complex PCIe
79+
device today may broadly have several function-level scopes:
80+
81+
1. A privileged function with full access to the on-device global state and
82+
configuration
83+
84+
2. Multiple hypervisor functions with control over itself and child functions
85+
used with VMs
86+
87+
3. Multiple VM functions tightly scoped within the VM
88+
89+
The device may create a logical parent/child relationship between these scopes.
90+
For instance a child VM's FW may be within the scope of the hypervisor FW. It is
91+
quite common in the VFIO world that the hypervisor environment has a complex
92+
provisioning/profiling/configuration responsibility for the function VFIO
93+
assigns to the VM.
94+
95+
Further, within the function, devices often have RPC commands that fall within
96+
some general scopes of action (see enum fwctl_rpc_scope):
97+
98+
1. Access to function & child configuration, FLASH, etc. that becomes live at a
99+
function reset. Access to function & child runtime configuration that is
100+
transparent or non-disruptive to any driver or VM.
101+
102+
2. Read-only access to function debug information that may report on FW objects
103+
in the function & child, including FW objects owned by other kernel
104+
subsystems.
105+
106+
3. Write access to function & child debug information strictly compatible with
107+
the principles of kernel lockdown and kernel integrity protection. Triggers
108+
a kernel Taint.
109+
110+
4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
111+
112+
User space will provide a scope label on each RPC and the kernel must enforce the
113+
above CAPs and taints based on that scope. A combination of kernel and FW can
114+
enforce that RPCs are placed in the correct scope by user space.
115+
116+
Denied behavior
117+
---------------
118+
119+
There are many things this interface must not allow user space to do (without a
120+
Taint or CAP), broadly derived from the principles of kernel lockdown. Some
121+
examples:
122+
123+
1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
124+
untrusted code, or otherwise compromise device or system security and
125+
integrity.
126+
127+
2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
128+
objects owned by kernel drivers.
129+
130+
3. Directly configure or otherwise control kernel drivers. A subsystem kernel
131+
driver can react to the device configuration at function reset/driver load
132+
time, but otherwise must not be coupled to fwctl.
133+
134+
4. Operate the HW in a way that overlaps with the core purpose of another
135+
primary kernel subsystem, such as read/write to LBAs, send/receive of
136+
network packets, or operate an accelerator's data plane.
137+
138+
fwctl is not a replacement for device direct access subsystems like uacce or
139+
VFIO.
140+
141+
Operations exposed through fwctl's non-taining interfaces should be fully
142+
sharable with other users of the device. For instance exposing a RPC through
143+
fwctl should never prevent a kernel subsystem from also concurrently using that
144+
same RPC or hardware unit down the road. In such cases fwctl will be less
145+
important than proper kernel subsystems that eventually emerge. Mistakes in this
146+
area resulting in clashes will be resolved in favour of a kernel implementation.
147+
148+
fwctl User API
149+
==============
150+
151+
.. kernel-doc:: include/uapi/fwctl/fwctl.h
152+
153+
sysfs Class
154+
-----------
155+
156+
fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
157+
(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
158+
operates the iotcl uAPI described above.
159+
160+
fwctl devices can be related to driver components in other subsystems through
161+
sysfs::
162+
163+
$ ls /sys/class/fwctl/fwctl0/device/infiniband/
164+
ibp0s10f0
165+
166+
$ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
167+
fwctl0/
168+
169+
$ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
170+
dev device power subsystem uevent
171+
172+
User space Community
173+
--------------------
174+
175+
Drawing inspiration from nvme-cli, participating in the kernel side must come
176+
with a user space in a common TBD git tree, at a minimum to usefully operate the
177+
kernel driver. Providing such an implementation is a pre-condition to merging a
178+
kernel driver.
179+
180+
The goal is to build user space community around some of the shared problems
181+
we all have, and ideally develop some common user space programs with some
182+
starting themes of:
183+
184+
- Device in-field debugging
185+
186+
- HW provisioning
187+
188+
- VFIO child device profiling before VM boot
189+
190+
- Confidential Compute topics (attestation, secure provisioning)
191+
192+
that stretch across all subsystems in the kernel. fwupd is a great example of
193+
how an excellent user space experience can emerge out of kernel-side diversity.
194+
195+
fwctl Kernel API
196+
================
197+
198+
.. kernel-doc:: drivers/fwctl/main.c
199+
:export:
200+
.. kernel-doc:: include/linux/fwctl.h
201+
202+
fwctl Driver design
203+
-------------------
204+
205+
In many cases a fwctl driver is going to be part of a larger cross-subsystem
206+
device possibly using the auxiliary_device mechanism. In that case several
207+
subsystems are going to be sharing the same device and FW interface layer so the
208+
device design must already provide for isolation and cooperation between kernel
209+
subsystems. fwctl should fit into that same model.
210+
211+
Part of the driver should include a description of how its scope restrictions
212+
and security model work. The driver and FW together must ensure that RPCs
213+
provided by user space are mapped to the appropriate scope. If the validation is
214+
done in the driver then the validation can read a 'command effects' report from
215+
the device, or hardwire the enforcement. If the validation is done in the FW,
216+
then the driver should pass the fwctl_rpc_scope to the FW along with the command.
217+
218+
The driver and FW must cooperate to ensure that either fwctl cannot allocate
219+
any FW resources, or any resources it does allocate are freed on FD closure. A
220+
driver primarily constructed around FW RPCs may find that its core PCI function
221+
and RPC layer belongs under fwctl with auxiliary devices connecting to other
222+
subsystems.
223+
224+
Each device type must be mindful of Linux's philosophy for stable ABI. The FW
225+
RPC interface does not have to meet a strictly stable ABI, but it does need to
226+
meet an expectation that userspace tools that are deployed and in significant
227+
use don't needlessly break. FW upgrade and kernel upgrade should keep widely
228+
deployed tooling working.
229+
230+
Development and debugging focused RPCs under more permissive scopes can have
231+
less stabilitiy if the tools using them are only run under exceptional
232+
circumstances and not for every day use of the device. Debugging tools may even
233+
require exact version matching as they may require something similar to DWARF
234+
debug information from the FW binary.
235+
236+
Security Response
237+
=================
238+
239+
The kernel remains the gatekeeper for this interface. If violations of the
240+
scopes, security or isolation principles are found, we have options to let
241+
devices fix them with a FW update, push a kernel patch to parse and block RPC
242+
commands or push a kernel patch to block entire firmware versions/devices.
243+
244+
While the kernel can always directly parse and restrict RPCs, it is expected
245+
that the existing kernel pattern of allowing drivers to delegate validation to
246+
FW to be a useful design.
247+
248+
Existing Similar Examples
249+
=========================
250+
251+
The approach described in this document is not a new idea. Direct, or near
252+
direct device access has been offered by the kernel in different areas for
253+
decades. With more devices wanting to follow this design pattern it is becoming
254+
clear that it is not entirely well understood and, more importantly, the
255+
security considerations are not well defined or agreed upon.
256+
257+
Some examples:
258+
259+
- HW RAID controllers. This includes RPCs to do things like compose drives into
260+
a RAID volume, configure RAID parameters, monitor the HW and more.
261+
262+
- Baseboard managers. RPCs for configuring settings in the device and more
263+
264+
- NVMe vendor command capsules. nvme-cli provides access to some monitoring
265+
functions that different products have defined, but more exist.
266+
267+
- CXL also has a NVMe-like vendor command system.
268+
269+
- DRM allows user space drivers to send commands to the device via kernel
270+
mediation
271+
272+
- RDMA allows user space drivers to directly push commands to the device
273+
without kernel involvement
274+
275+
- Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
276+
277+
The first 4 are examples of areas that fwctl intends to cover. The latter three
278+
are examples of denied behavior as they fully overlap with the primary purpose
279+
of a kernel subsystem.
280+
281+
Some key lessons learned from these past efforts are the importance of having a
282+
common user space project to use as a pre-condition for obtaining a kernel
283+
driver. Developing good community around useful software in user space is key to
284+
getting companies to fund participation to enable their products.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Firmware Control (FWCTL) Userspace API
4+
======================================
5+
6+
A framework that define a common set of limited rules that allows user space
7+
to securely construct and execute RPCs inside device firmware.
8+
9+
.. toctree::
10+
:maxdepth: 1
11+
12+
fwctl

Documentation/userspace-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ Devices and I/O
4545

4646
accelerators/ocxl
4747
dma-buf-alloc-exchange
48+
fwctl/index
4849
gpio/index
4950
iommufd
5051
media/index

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9563,6 +9563,7 @@ M: Jason Gunthorpe <[email protected]>
95639563
M: Saeed Mahameed <[email protected]>
95649564
R: Jonathan Cameron <[email protected]>
95659565
S: Maintained
9566+
F: Documentation/userspace-api/fwctl/
95669567
F: drivers/fwctl/
95679568
F: include/linux/fwctl.h
95689569
F: include/uapi/fwctl/

0 commit comments

Comments
 (0)