Skip to content

Commit ba0f4c4

Browse files
committed
Merge tag 'nova-next-v6.17-2025-07-18' of https://gitlab.freedesktop.org/drm/nova into drm-next
Nova changes for v6.17 DMA: - Merge topic/dma-features-2025-06-23 from alloc tree. - Clarify wording and be consistent in 'coherent' nomenclature. - Convert the read!() / write!() macros to return a Result. - Add as_slice() / write() methods in CoherentAllocation. - Fix doc-comment of dma_handle(). - Expose count() and size() in CoherentAllocation and add the corresponding type invariants. - Implement CoherentAllocation::dma_handle_with_offset(). nova-core: - Various register!() macro improvements. - Custom Sleep / Delay helpers (until the actual abstractions land). - Add DMA object abstraction. - VBIOS - Image parser / iterator. - PMU table look up in FWSEC. - FWSEC ucode extraction. - Register sysmem flush page. - Falcon - Generic falcon boot code and HAL (Ampere). - GSP / SEC2 specific code. - FWSEC-FRTS - Compute layout of FRTS region (FbLayout and HAL). - Load into GSP falcon and execute. - Add Documentation for VBIOS layout, Devinit process, Fwsec operation and layout, Falcon basics. - Update and annotate TODO list. - Add Alexandre Courbot as co-maintainer. Rust: - Make ETIMEDOUT error available. - Add size constants up to SZ_2G. Signed-off-by: Dave Airlie <[email protected]> From: "Danilo Krummrich" <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2 parents acab5fb + 14ae91a commit ba0f4c4

File tree

32 files changed

+4320
-115
lines changed

32 files changed

+4320
-115
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==================================
4+
Device Initialization (devinit)
5+
==================================
6+
The devinit process is complex and subject to change. This document provides a high-level
7+
overview using the Ampere GPU family as an example. The goal is to provide a conceptual
8+
overview of the process to aid in understanding the corresponding kernel code.
9+
10+
Device initialization (devinit) is a crucial sequence of register read/write operations
11+
that occur after a GPU reset. The devinit sequence is essential for properly configuring
12+
the GPU hardware before it can be used.
13+
14+
The devinit engine is an interpreter program that typically runs on the PMU (Power Management
15+
Unit) microcontroller of the GPU. This interpreter executes a "script" of initialization
16+
commands. The devinit engine itself is part of the VBIOS ROM in the same ROM image as the
17+
FWSEC (Firmware Security) image (see fwsec.rst and vbios.rst) and it runs before the
18+
nova-core driver is even loaded. On an Ampere GPU, the devinit ucode is separate from the
19+
FWSEC ucode. It is launched by FWSEC, which runs on the GSP in 'heavy-secure' mode, while
20+
devinit runs on the PMU in 'light-secure' mode.
21+
22+
Key Functions of devinit
23+
------------------------
24+
devinit performs several critical tasks:
25+
26+
1. Programming VRAM memory controller timings
27+
2. Power sequencing
28+
3. Clock and PLL (Phase-Locked Loop) configuration
29+
4. Thermal management
30+
31+
Low-level Firmware Initialization Flow
32+
--------------------------------------
33+
Upon reset, several microcontrollers on the GPU (such as PMU, SEC2, GSP, etc.) run GPU
34+
firmware (gfw) code to set up the GPU and its core parameters. Most of the GPU is
35+
considered unusable until this initialization process completes.
36+
37+
These low-level GPU firmware components are typically:
38+
39+
1. Located in the VBIOS ROM in the same ROM partition (see vbios.rst and fwsec.rst).
40+
2. Executed in sequence on different microcontrollers:
41+
42+
- The devinit engine typically but not necessarily runs on the PMU.
43+
- On an Ampere GPU, the FWSEC typically runs on the GSP (GPU System Processor) in
44+
heavy-secure mode.
45+
46+
Before the driver can proceed with further initialization, it must wait for a signal
47+
indicating that core initialization is complete (known as GFW_BOOT). This signal is
48+
asserted by the FWSEC running on the GSP in heavy-secure mode.
49+
50+
Runtime Considerations
51+
----------------------
52+
It's important to note that the devinit sequence also needs to run during suspend/resume
53+
operations at runtime, not just during initial boot, as it is critical to power management.
54+
55+
Security and Access Control
56+
---------------------------
57+
The initialization process involves careful privilege management. For example, before
58+
accessing certain completion status registers, the driver must check privilege level
59+
masks. Some registers are only accessible after secure firmware (FWSEC) lowers the
60+
privilege level to allow CPU (LS/low-secure) access. This is the case, for example,
61+
when receiving the GFW_BOOT signal.
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==============================
4+
Falcon (FAst Logic Controller)
5+
==============================
6+
The following sections describe the Falcon core and the ucode running on it.
7+
The descriptions are based on the Ampere GPU or earlier designs; however, they
8+
should mostly apply to future designs as well, but everything is subject to
9+
change. The overview provided here is mainly tailored towards understanding the
10+
interactions of nova-core driver with the Falcon.
11+
12+
NVIDIA GPUs embed small RISC-like microcontrollers called Falcon cores, which
13+
handle secure firmware tasks, initialization, and power management. Modern
14+
NVIDIA GPUs may have multiple such Falcon instances (e.g., GSP (the GPU system
15+
processor) and SEC2 (the security engine)) and also may integrate a RISC-V core.
16+
This core is capable of running both RISC-V and Falcon code.
17+
18+
The code running on the Falcon cores is also called 'ucode', and will be
19+
referred to as such in the following sections.
20+
21+
Falcons have separate instruction and data memories (IMEM/DMEM) and provide a
22+
small DMA engine (via the FBIF - "Frame Buffer Interface") to load code from
23+
system memory. The nova-core driver must reset and configure the Falcon, load
24+
its firmware via DMA, and start its CPU.
25+
26+
Falcon security levels
27+
======================
28+
Falcons can run in Non-secure (NS), Light Secure (LS), or Heavy Secure (HS)
29+
modes.
30+
31+
Heavy Secured (HS) also known as Privilege Level 3 (PL3)
32+
--------------------------------------------------------
33+
HS ucode is the most trusted code and has access to pretty much everything on
34+
the chip. The HS binary includes a signature in it which is verified at boot.
35+
This signature verification is done by the hardware itself, thus establishing a
36+
root of trust. For example, the FWSEC-FRTS command (see fwsec.rst) runs on the
37+
GSP in HS mode. FRTS, which involves setting up and loading content into the WPR
38+
(Write Protect Region), has to be done by the HS ucode and cannot be done by the
39+
host CPU or LS ucode.
40+
41+
Light Secured (LS or PL2) and Non Secured (NS or PL0)
42+
-----------------------------------------------------
43+
These modes are less secure than HS. Like HS, the LS or NS ucode binary also
44+
typically includes a signature in it. To load firmware in LS or NS mode onto a
45+
Falcon, another Falcon needs to be running in HS mode, which also establishes the
46+
root of trust. For example, in the case of an Ampere GPU, the CPU runs the "Booter"
47+
ucode in HS mode on the SEC2 Falcon, which then authenticates and runs the
48+
run-time GSP binary (GSP-RM) in LS mode on the GSP Falcon. Similarly, as an
49+
example, after reset on an Ampere, FWSEC runs on the GSP which then loads the
50+
devinit engine onto the PMU in LS mode.
51+
52+
Root of trust establishment
53+
---------------------------
54+
To establish a root of trust, the code running on a Falcon must be immutable and
55+
hardwired into a read-only memory (ROM). This follows industry norms for
56+
verification of firmware. This code is called the Boot ROM (BROM). The nova-core
57+
driver on the CPU communicates with Falcon's Boot ROM through various Falcon
58+
registers prefixed with "BROM" (see regs.rs).
59+
60+
After nova-core driver reads the necessary ucode from VBIOS, it programs the
61+
BROM and DMA registers to trigger the Falcon to load the HS ucode from the system
62+
memory into the Falcon's IMEM/DMEM. Once the HS ucode is loaded, it is verified
63+
by the Falcon's Boot ROM.
64+
65+
Once the verified HS code is running on a Falcon, it can verify and load other
66+
LS/NS ucode binaries onto other Falcons and start them. The process of signature
67+
verification is the same as HS; just in this case, the hardware (BROM) doesn't
68+
compute the signature, but the HS ucode does.
69+
70+
The root of trust is therefore established as follows:
71+
Hardware (Boot ROM running on the Falcon) -> HS ucode -> LS/NS ucode.
72+
73+
On an Ampere GPU, for example, the boot verification flow is:
74+
Hardware (Boot ROM running on the SEC2) ->
75+
HS ucode (Booter running on the SEC2) ->
76+
LS ucode (GSP-RM running on the GSP)
77+
78+
.. note::
79+
While the CPU can load HS ucode onto a Falcon microcontroller and have it
80+
verified by the hardware and run, the CPU itself typically does not load
81+
LS or NS ucode and run it. Loading of LS or NS ucode is done mainly by the
82+
HS ucode. For example, on an Ampere GPU, after the Booter ucode runs on the
83+
SEC2 in HS mode and loads the GSP-RM binary onto the GSP, it needs to run
84+
the "SEC2-RTOS" ucode at runtime. This presents a problem: there is no
85+
component to load the SEC2-RTOS ucode onto the SEC2. The CPU cannot load
86+
LS code, and GSP-RM must run in LS mode. To overcome this, the GSP is
87+
temporarily made to run HS ucode (which is itself loaded by the CPU via
88+
the nova-core driver using a "GSP-provided sequencer") which then loads
89+
the SEC2-RTOS ucode onto the SEC2 in LS mode. The GSP then resumes
90+
running its own GSP-RM LS ucode.
91+
92+
Falcon memory subsystem and DMA engine
93+
======================================
94+
Falcons have separate instruction and data memories (IMEM/DMEM)
95+
and contains a small DMA engine called FBDMA (Framebuffer DMA) which does
96+
DMA transfers to/from the IMEM/DMEM memory inside the Falcon via the FBIF
97+
(Framebuffer Interface), to external memory.
98+
99+
DMA transfers are possible from the Falcon's memory to both the system memory
100+
and the framebuffer memory (VRAM).
101+
102+
To perform a DMA via the FBDMA, the FBIF is configured to decide how the memory
103+
is accessed (also known as aperture type). In the nova-core driver, this is
104+
determined by the `FalconFbifTarget` enum.
105+
106+
The IO-PMP block (Input/Output Physical Memory Protection) unit in the Falcon
107+
controls access by the FBDMA to the external memory.
108+
109+
Conceptual diagram (not exact) of the Falcon and its memory subsystem is as follows::
110+
111+
External Memory (Framebuffer / System DRAM)
112+
^ |
113+
| |
114+
| v
115+
+-----------------------------------------------------+
116+
| | |
117+
| +---------------+ | |
118+
| | FBIF |-------+ | FALCON
119+
| | (FrameBuffer | Memory Interface | PROCESSOR
120+
| | InterFace) | |
121+
| | Apertures | |
122+
| | Configures | |
123+
| | mem access | |
124+
| +-------^-------+ |
125+
| | |
126+
| | FBDMA uses configured FBIF apertures |
127+
| | to access External Memory
128+
| |
129+
| +-------v--------+ +---------------+
130+
| | FBDMA | cfg | RISC |
131+
| | (FrameBuffer |<---->| CORE |----->. Direct Core Access
132+
| | DMA Engine) | | | |
133+
| | - Master dev. | | (can run both | |
134+
| +-------^--------+ | Falcon and | |
135+
| | cfg--->| RISC-V code) | |
136+
| | / | | |
137+
| | | +---------------+ | +------------+
138+
| | | | | BROM |
139+
| | | <--->| (Boot ROM) |
140+
| | / | +------------+
141+
| | v |
142+
| +---------------+ |
143+
| | IO-PMP | Controls access by FBDMA |
144+
| | (IO Physical | and other IO Masters |
145+
| | Memory Protect) |
146+
| +-------^-------+ |
147+
| | |
148+
| | Protected Access Path for FBDMA |
149+
| v |
150+
| +---------------------------------------+ |
151+
| | Memory | |
152+
| | +---------------+ +------------+ | |
153+
| | | IMEM | | DMEM | |<-----+
154+
| | | (Instruction | | (Data | |
155+
| | | Memory) | | Memory) | |
156+
| | +---------------+ +------------+ |
157+
| +---------------------------------------+
158+
+-----------------------------------------------------+

0 commit comments

Comments
 (0)