Skip to content

Commit c7d6cb4

Browse files
committed
Merge tag 'drm-misc-next-2024-12-05' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
[airlied: handle module ns conflict] drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - Remove driver date from drm_driver Driver Changes: - amdxdna: New driver! - ivpu: Fix qemu crash when using passthrough - nouveau: expose GSP-RM logging buffers via debugfs - panfrost: Add MT8188 Mali-G57 MC3 support - panthor: misc improvements, - rockchip: Gamma LUT support - tidss: Misc improvements - virtio: convert to helpers, add prime support for scanout buffers - v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL - vc4: Add support for BCM2712 - vkms: Improvements all across the board - panels: - Introduce backlight quirks infrastructure - New panels: KDB KD116N2130B12 Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20241205-agile-straight-pegasus-aca7f4@houat
2 parents fac04ef + cb2e1c2 commit c7d6cb4

File tree

249 files changed

+14126
-1203
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

249 files changed

+14126
-1203
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
3+
.. include:: <isonum.txt>
4+
5+
=========
6+
AMD NPU
7+
=========
8+
9+
:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
10+
:Author: Sonal Santan <[email protected]>
11+
12+
Overview
13+
========
14+
15+
AMD NPU (Neural Processing Unit) is a multi-user AI inference accelerator
16+
integrated into AMD client APU. NPU enables efficient execution of Machine
17+
Learning applications like CNN, LLM, etc. NPU is based on
18+
`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver.
19+
20+
21+
Hardware Description
22+
====================
23+
24+
AMD NPU consists of the following hardware components:
25+
26+
AMD XDNA Array
27+
--------------
28+
29+
AMD XDNA Array comprises of 2D array of compute and memory tiles built with
30+
`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles and 1
31+
row of memory tile. Each compute tile contains a VLIW processor with its own
32+
dedicated program and data memory. The memory tile acts as L2 memory. The 2D
33+
array can be partitioned at a column boundary creating a spatially isolated
34+
partition which can be bound to a workload context.
35+
36+
Each column also has dedicated DMA engines to move data between host DDR and
37+
memory tile.
38+
39+
AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e., 4 rows of
40+
compute tiles arranged into 5 columns. AMD Strix Point client APU have 4x8
41+
topology, i.e., 4 rows of compute tiles arranged into 8 columns.
42+
43+
Shared L2 Memory
44+
----------------
45+
46+
The single row of memory tiles create a pool of software managed on chip L2
47+
memory. DMA engines are used to move data between host DDR and memory tiles.
48+
AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2 memory.
49+
AMD Strix Point NPU has a total of 4096 KB of L2 memory.
50+
51+
Microcontroller
52+
---------------
53+
54+
A microcontroller runs NPU Firmware which is responsible for command processing,
55+
XDNA Array partition setup, XDNA Array configuration, workload context
56+
management and workload orchestration.
57+
58+
NPU Firmware uses a dedicated instance of an isolated non-privileged context
59+
called ERT to service each workload context. ERT is also used to execute user
60+
provided ``ctrlcode`` associated with the workload context.
61+
62+
NPU Firmware uses a single isolated privileged context called MERT to service
63+
management commands from the amdxdna driver.
64+
65+
Mailboxes
66+
---------
67+
68+
The microcontroller and amdxdna driver use a privileged channel for management
69+
tasks like setting up of contexts, telemetry, query, error handling, setting up
70+
user channel, etc. As mentioned before, privileged channel requests are
71+
serviced by MERT. The privileged channel is bound to a single mailbox.
72+
73+
The microcontroller and amdxdna driver use a dedicated user channel per
74+
workload context. The user channel is primarily used for submitting work to
75+
the NPU. As mentioned before, a user channel requests are serviced by an
76+
instance of ERT. Each user channel is bound to its own dedicated mailbox.
77+
78+
PCIe EP
79+
-------
80+
81+
NPU is visible to the x86 host CPU as a PCIe device with multiple BARs and some
82+
MSI-X interrupt vectors. NPU uses a dedicated high bandwidth SoC level fabric
83+
for reading or writing into host memory. Each instance of ERT gets its own
84+
dedicated MSI-X interrupt. MERT gets a single instance of MSI-X interrupt.
85+
86+
The number of PCIe BARs varies depending on the specific device. Based on their
87+
functions, PCIe BARs can generally be categorized into the following types.
88+
89+
* PSP BAR: Expose the AMD PSP (Platform Security Processor) function
90+
* SMU BAR: Expose the AMD SMU (System Management Unit) function
91+
* SRAM BAR: Expose ring buffers for the mailbox
92+
* Mailbox BAR: Expose the mailbox control registers (head, tail and ISR
93+
registers etc.)
94+
* Public Register BAR: Expose public registers
95+
96+
On specific devices, the above-mentioned BAR type might be combined into a
97+
single physical PCIe BAR. Or a module might require two physical PCIe BARs to
98+
be fully functional. For example,
99+
100+
* On AMD Phoenix device, PSP, SMU, Public Register BARs are on PCIe BAR index 0.
101+
* On AMD Strix Point device, Mailbox and Public Register BARs are on PCIe BAR
102+
index 0. The PSP has some registers in PCIe BAR index 0 (Public Register BAR)
103+
and PCIe BAR index 4 (PSP BAR).
104+
105+
Process Isolation Hardware
106+
--------------------------
107+
108+
As explained before, XDNA Array can be dynamically divided into isolated
109+
spatial partitions, each of which may have one or more columns. The spatial
110+
partition is setup by programming the column isolation registers by the
111+
microcontroller. Each spatial partition is associated with a PASID which is
112+
also programmed by the microcontroller. Hence multiple spatial partitions in
113+
the NPU can make concurrent host access protected by PASID.
114+
115+
The NPU FW itself uses microcontroller MMU enforced isolated contexts for
116+
servicing user and privileged channel requests.
117+
118+
119+
Mixed Spatial and Temporal Scheduling
120+
=====================================
121+
122+
AMD XDNA architecture supports mixed spatial and temporal (time sharing)
123+
scheduling of 2D array. This means that spatial partitions may be setup and
124+
torn down dynamically to accommodate various workloads. A *spatial* partition
125+
may be *exclusively* bound to one workload context while another partition may
126+
be *temporarily* bound to more than one workload contexts. The microcontroller
127+
updates the PASID for a temporarily shared partition to match the context that
128+
has been bound to the partition at any moment.
129+
130+
Resource Solver
131+
---------------
132+
133+
The Resource Solver component of the amdxdna driver manages the allocation
134+
of 2D array among various workloads. Every workload describes the number
135+
of columns required to run the NPU binary in its metadata. The Resource Solver
136+
component uses hints passed by the workload and its own heuristics to
137+
decide 2D array (re)partition strategy and mapping of workloads for spatial and
138+
temporal sharing of columns. The FW enforces the context-to-column(s) resource
139+
binding decisions made by the Resource Solver.
140+
141+
AMD Phoenix and AMD Hawk Point client NPU can support 6 concurrent workload
142+
contexts. AMD Strix Point can support 16 concurrent workload contexts.
143+
144+
145+
Application Binaries
146+
====================
147+
148+
A NPU application workload is comprised of two separate binaries which are
149+
generated by the NPU compiler.
150+
151+
1. AMD XDNA Array overlay, which is used to configure a NPU spatial partition.
152+
The overlay contains instructions for setting up the stream switch
153+
configuration and ELF for the compute tiles. The overlay is loaded on the
154+
spatial partition bound to the workload by the associated ERT instance.
155+
Refer to the
156+
`Versal Adaptive SoC AIE-ML Architecture Manual (AM020)`_ for more details.
157+
158+
2. ``ctrlcode``, used for orchestrating the overlay loaded on the spatial
159+
partition. ``ctrlcode`` is executed by the ERT running in protected mode on
160+
the microcontroller in the context of the workload. ``ctrlcode`` is made up
161+
of a sequence of opcodes named ``XAie_TxnOpcode``. Refer to the
162+
`AI Engine Run Time`_ for more details.
163+
164+
165+
Special Host Buffers
166+
====================
167+
168+
Per-context Instruction Buffer
169+
------------------------------
170+
171+
Every workload context uses a host resident 64 MB buffer which is memory
172+
mapped into the ERT instance created to service the workload. The ``ctrlcode``
173+
used by the workload is copied into this special memory. This buffer is
174+
protected by PASID like all other input/output buffers used by that workload.
175+
Instruction buffer is also mapped into the user space of the workload.
176+
177+
Global Privileged Buffer
178+
------------------------
179+
180+
In addition, the driver also allocates a single buffer for maintenance tasks
181+
like recording errors from MERT. This global buffer uses the global IOMMU
182+
domain and is only accessible by MERT.
183+
184+
185+
High-level Use Flow
186+
===================
187+
188+
Here are the steps to run a workload on AMD NPU:
189+
190+
1. Compile the workload into an overlay and a ``ctrlcode`` binary.
191+
2. Userspace opens a context in the driver and provides the overlay.
192+
3. The driver checks with the Resource Solver for provisioning a set of columns
193+
for the workload.
194+
4. The driver then asks MERT to create a context on the device with the desired
195+
columns.
196+
5. MERT then creates an instance of ERT. MERT also maps the Instruction Buffer
197+
into ERT memory.
198+
6. The userspace then copies the ``ctrlcode`` to the Instruction Buffer.
199+
7. Userspace then creates a command buffer with pointers to input, output, and
200+
instruction buffer; it then submits command buffer with the driver and goes
201+
to sleep waiting for completion.
202+
8. The driver sends the command over the Mailbox to ERT.
203+
9. ERT *executes* the ``ctrlcode`` in the instruction buffer.
204+
10. Execution of the ``ctrlcode`` kicks off DMAs to and from the host DDR while
205+
AMD XDNA Array is running.
206+
11. When ERT reaches end of ``ctrlcode``, it raises an MSI-X to send completion
207+
signal to the driver which then wakes up the waiting workload.
208+
209+
210+
Boot Flow
211+
=========
212+
213+
amdxdna driver uses PSP to securely load signed NPU FW and kick off the boot
214+
of the NPU microcontroller. amdxdna driver then waits for the alive signal in
215+
a special location on BAR 0. The NPU is switched off during SoC suspend and
216+
turned on after resume where the NPU FW is reloaded, and the handshake is
217+
performed again.
218+
219+
220+
Userspace components
221+
====================
222+
223+
Compiler
224+
--------
225+
226+
Peano is an LLVM based open-source compiler for AMD XDNA Array compute tile
227+
available at:
228+
https://github.com/Xilinx/llvm-aie
229+
230+
The open-source IREE compiler supports graph compilation of ML models for AMD
231+
NPU and uses Peano underneath. It is available at:
232+
https://github.com/nod-ai/iree-amd-aie
233+
234+
Usermode Driver (UMD)
235+
---------------------
236+
237+
The open-source XRT runtime stack interfaces with amdxdna kernel driver. XRT
238+
can be found at:
239+
https://github.com/Xilinx/XRT
240+
241+
The open-source XRT shim for NPU is can be found at:
242+
https://github.com/amd/xdna-driver
243+
244+
245+
DMA Operation
246+
=============
247+
248+
DMA operation instructions are encoded in the ``ctrlcode`` as
249+
``XAIE_IO_BLOCKWRITE`` opcode. When ERT executes ``XAIE_IO_BLOCKWRITE``, DMA
250+
operations between host DDR and L2 memory are effected.
251+
252+
253+
Error Handling
254+
==============
255+
256+
When MERT detects an error in AMD XDNA Array, it pauses execution for that
257+
workload context and sends an asynchronous message to the driver over the
258+
privileged channel. The driver then sends a buffer pointer to MERT to capture
259+
the register states for the partition bound to faulting workload context. The
260+
driver then decodes the error by reading the contents of the buffer pointer.
261+
262+
263+
Telemetry
264+
=========
265+
266+
MERT can report various kinds of telemetry information like the following:
267+
268+
* L1 interrupt counter
269+
* DMA counter
270+
* Deep Sleep counter
271+
* etc.
272+
273+
274+
References
275+
==========
276+
277+
- `AMD XDNA Architecture <https://www.amd.com/en/technologies/xdna.html>`_
278+
- `AMD AI Engine Technology <https://www.xilinx.com/products/technology/ai-engine.html>`_
279+
- `Peano <https://github.com/Xilinx/llvm-aie>`_
280+
- `Versal Adaptive SoC AIE-ML Architecture Manual (AM020) <https://docs.amd.com/r/en-US/am020-versal-aie-ml>`_
281+
- `AI Engine Run Time <https://github.com/Xilinx/aie-rt/tree/release/main_aig>`_

Documentation/accel/amdxdna/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
3+
=====================================
4+
accel/amdxdna NPU driver
5+
=====================================
6+
7+
The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit).
8+
9+
.. toctree::
10+
11+
amdnpu

Documentation/accel/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Compute Accelerators
88
:maxdepth: 1
99

1010
introduction
11+
amdxdna/index
1112
qaic/index
1213

1314
.. only:: subproject and html

Documentation/devicetree/bindings/display/brcm,bcm2711-hdmi.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ properties:
1414
enum:
1515
- brcm,bcm2711-hdmi0
1616
- brcm,bcm2711-hdmi1
17+
- brcm,bcm2712-hdmi0
18+
- brcm,bcm2712-hdmi1
1719

1820
reg:
1921
items:

Documentation/devicetree/bindings/display/brcm,bcm2835-hvs.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ properties:
1313
compatible:
1414
enum:
1515
- brcm,bcm2711-hvs
16+
- brcm,bcm2712-hvs
1617
- brcm,bcm2835-hvs
1718

1819
reg:
@@ -36,7 +37,9 @@ if:
3637
properties:
3738
compatible:
3839
contains:
39-
const: brcm,bcm2711-hvs
40+
enum:
41+
- brcm,bcm2711-hvs
42+
- brcm,bcm2712-hvs
4043

4144
then:
4245
required:

Documentation/devicetree/bindings/display/brcm,bcm2835-pixelvalve0.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ properties:
2020
- brcm,bcm2711-pixelvalve2
2121
- brcm,bcm2711-pixelvalve3
2222
- brcm,bcm2711-pixelvalve4
23+
- brcm,bcm2712-pixelvalve0
24+
- brcm,bcm2712-pixelvalve1
25+
- brcm,bcm2712-pixelvalve2
2326

2427
reg:
2528
maxItems: 1

Documentation/devicetree/bindings/display/brcm,bcm2835-txp.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,10 @@ maintainers:
1111

1212
properties:
1313
compatible:
14-
const: brcm,bcm2835-txp
14+
enum:
15+
- brcm,bcm2712-mop
16+
- brcm,bcm2712-moplet
17+
- brcm,bcm2835-txp
1518

1619
reg:
1720
maxItems: 1

Documentation/devicetree/bindings/display/brcm,bcm2835-vc4.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ properties:
1818
compatible:
1919
enum:
2020
- brcm,bcm2711-vc5
21+
- brcm,bcm2712-vc6
2122
- brcm,bcm2835-vc4
2223
- brcm,cygnus-vc4
2324

Documentation/devicetree/bindings/display/panel/samsung,atna33xc20.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ properties:
2323
- samsung,atna45af01
2424
# Samsung 14.5" 3K (2944x1840 pixels) eDP AMOLED panel
2525
- samsung,atna45dc02
26+
# Samsung 15.6" 3K (2880x1620 pixels) eDP AMOLED panel
27+
- samsung,atna56ac03
2628
- const: samsung,atna33xc20
2729

2830
enable-gpios: true

Documentation/gpu/drm-kms-helpers.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,9 @@ Panel Helper Reference
221221
.. kernel-doc:: drivers/gpu/drm/drm_panel_orientation_quirks.c
222222
:export:
223223

224+
.. kernel-doc:: drivers/gpu/drm/drm_panel_backlight_quirks.c
225+
:export:
226+
224227
Panel Self Refresh Helper Reference
225228
===================================
226229

0 commit comments

Comments
 (0)