Skip to content

Commit e4528b9

Browse files
Gregory Pricedavejiang
authored andcommitted
cxl: docs/platform/bios-and-efi documentation
Add some docs on CXL configurations done in bios/efi that affect linux configuration - information vendors may care to consider. Signed-off-by: Gregory Price <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Dave Jiang <[email protected]>
1 parent 750d662 commit e4528b9

File tree

2 files changed

+268
-0
lines changed

2 files changed

+268
-0
lines changed

Documentation/driver-api/cxl/index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,12 @@ that have impacts on each other. The docs here break up configurations steps.
2121

2222
devices/device-types
2323

24+
.. toctree::
25+
:maxdepth: 2
26+
:caption: Platform Configuration
27+
28+
platform/bios-and-efi
29+
2430
.. toctree::
2531
:maxdepth: 1
2632
:caption: Linux Kernel Configuration
Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
======================
4+
BIOS/EFI Configuration
5+
======================
6+
7+
BIOS and EFI are largely responsible for configuring static information about
8+
devices (or potential future devices) such that Linux can build the appropriate
9+
logical representations of these devices.
10+
11+
At a high level, this is what occurs during this phase of configuration.
12+
13+
* The bootloader starts the BIOS/EFI.
14+
15+
* BIOS/EFI do early device probe to determine static configuration
16+
17+
* BIOS/EFI creates ACPI Tables that describe static config for the OS
18+
19+
* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc)
20+
21+
* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process.
22+
23+
Much of what this section is concerned with is ACPI Table production and
24+
static memory map configuration. More detail on these tables can be found
25+
under Platform Configuration -> ACPI Table Reference.
26+
27+
.. note::
28+
Platform Vendors should read carefully, as this sections has recommendations
29+
on physical memory region size and alignment, memory holes, HDM interleave,
30+
and what linux expects of HDM decoders trying to work with these features.
31+
32+
UEFI Settings
33+
=============
34+
If your platform supports it, the :code:`uefisettings` command can be used to
35+
read/write EFI settings. Changes will be reflected on the next reboot. Kexec
36+
is not a sufficient reboot.
37+
38+
One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit.
39+
When this is enabled, this bit tells linux to defer management of a memory
40+
region to a driver (in this case, the CXL driver). Otherwise, the memory is
41+
treated as "normal memory", and is exposed to the page allocator during
42+
:code:`__init`.
43+
44+
uefisettings examples
45+
---------------------
46+
47+
:code:`uefisettings identify` ::
48+
49+
uefisettings identify
50+
51+
bios_vendor: xxx
52+
bios_version: xxx
53+
bios_release: xxx
54+
bios_date: xxx
55+
product_name: xxx
56+
product_family: xxx
57+
product_version: xxx
58+
59+
On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL
60+
Memory Attribute` field. This may be called something else on your platform.
61+
62+
:code:`uefisettings get "CXL Memory Attribute"` ::
63+
64+
selector: xxx
65+
...
66+
question: Question {
67+
name: "CXL Memory Attribute",
68+
answer: "Enabled",
69+
...
70+
}
71+
72+
Physical Memory Map
73+
===================
74+
75+
Physical Address Region Alignment
76+
---------------------------------
77+
78+
As of Linux v6.14, the hotplug memory system requires memory regions to be
79+
uniform in size and alignment. While the CXL specification allows for memory
80+
regions as small as 256MB, the supported memory block size and alignment for
81+
hotplugged memory is architecture-defined.
82+
83+
A Linux memory blocks may be as small as 128MB and increase in powers of two.
84+
85+
* On ARM, the default block size and alignment is either 128MB or 256MB.
86+
87+
* On x86, the default block size is 256MB, and increases to 2GB as the
88+
capacity of the system increases up to 64GB.
89+
90+
For best support across versions, platform vendors should place CXL memory at
91+
a 2GB aligned base address, and regions should be 2GB aligned. This also helps
92+
prevent the creating thousands of memory devices (one per block).
93+
94+
Memory Holes
95+
------------
96+
97+
Holes in the memory map are tricky. Consider a 4GB device located at base
98+
address 0x100000000, but with the following memory map ::
99+
100+
---------------------
101+
| 0x100000000 |
102+
| CXL |
103+
| 0x1BFFFFFFF |
104+
---------------------
105+
| 0x1C0000000 |
106+
| MEMORY HOLE |
107+
| 0x1FFFFFFFF |
108+
---------------------
109+
| 0x200000000 |
110+
| CXL CONT. |
111+
| 0x23FFFFFFF |
112+
---------------------
113+
114+
There are two issues to consider:
115+
116+
* decoder programming, and
117+
* memory block alignment.
118+
119+
If your architecture requires 2GB uniform size and aligned memory blocks, the
120+
only capacity Linux is capable of mapping (as of v6.14) would be the capacity
121+
from `0x100000000-0x180000000`. The remaining capacity will be stranded, as
122+
they are not of 2GB aligned length.
123+
124+
Assuming your architecture and memory configuration allows 1GB memory blocks,
125+
this memory map is supported and this should be presented as multiple CFMWS
126+
in the CEDT that describe each side of the memory hole separately - along with
127+
matching decoders.
128+
129+
Multiple decoders can (and should) be used to manage such a memory hole (see
130+
below), but each chunk of a memory hole should be aligned to a reasonable block
131+
size (larger alignment is always better). If you intend to have memory holes
132+
in the memory map, expect to use one decoder per contiguous chunk of host
133+
physical memory.
134+
135+
As of v6.14, Linux does provide support for memory hotplug of multiple
136+
physical memory regions separated by a memory hole described by a single
137+
HDM decoder.
138+
139+
140+
Decoder Programming
141+
===================
142+
If BIOS/EFI intends to program the decoders to be statically configured,
143+
there are a few things to consider to avoid major pitfalls that will
144+
prevent Linux compatibility. Some of these recommendations are not
145+
required "per the specification", but Linux makes no guarantees of support
146+
otherwise.
147+
148+
149+
Translation Point
150+
-----------------
151+
Per the specification, the only decoders which **TRANSLATE** Host Physical
152+
Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**.
153+
All other decoders in the fabric are intended to route accesses without
154+
translating the addresses.
155+
156+
This is heavily implied by the specification, see: ::
157+
158+
CXL Specification 3.1
159+
8.2.4.20: CXL HDM Decoder Capability Structure
160+
- Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow
161+
- Implementation Note: Device Decoder Logic
162+
163+
Given this, Linux makes a strong assumption that decoders between CPU and
164+
endpoint will all be programmed with addresses ranges that are subsets of
165+
their parent decoder.
166+
167+
Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications
168+
"hand off" responsibility between domains, some early adopting platforms
169+
attempted to do translation at the originating memory controller or host
170+
bridge. This configuration requires a platform specific extension to the
171+
driver and is not officially endorsed - despite being supported.
172+
173+
It is *highly recommended* **NOT** to do this; otherwise, you are on your own
174+
to implement driver support for your platform.
175+
176+
Interleave and Configuration Flexibility
177+
----------------------------------------
178+
If providing cross-host-bridge interleave, a CFMWS entry in the CEDT must be
179+
presented with target host-bridges for the interleaved device sets (there may
180+
be multiple behind each host bridge).
181+
182+
If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is
183+
required for that host bridge - if it covers the entire capacity of the devices
184+
behind the host bridge.
185+
186+
If intending to provide users flexibility in programming decoders beyond the
187+
root, you may want to provide multiple CFMWS entries in the CEDT intended for
188+
different purposes. For example, you may want to consider adding:
189+
190+
1) A CFMWS entry to cover all interleavable host bridges.
191+
2) A CFMWS entry to cover all devices on a single host bridge.
192+
3) A CFMWS entry to cover each device.
193+
194+
A platform may choose to add all of these, or change the mode based on a BIOS
195+
setting. For each CFMWS entry, Linux expects descriptions of the described
196+
memory regions in the SRAT to determine the number of NUMA nodes it should
197+
reserve during early boot / init.
198+
199+
As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if
200+
a matching SRAT entry does not exist; however, this is not guaranteed in the
201+
future and such a configuration should be avoided.
202+
203+
Memory Holes
204+
------------
205+
If your platform includes memory holes intersparsed between your CXL memory, it
206+
is recommended to utilize multiple decoders to cover these regions of memory,
207+
rather than try to program the decoders to accept the entire range and expect
208+
Linux to manage the overlap.
209+
210+
For example, consider the Memory Hole described above ::
211+
212+
---------------------
213+
| 0x100000000 |
214+
| CXL |
215+
| 0x1BFFFFFFF |
216+
---------------------
217+
| 0x1C0000000 |
218+
| MEMORY HOLE |
219+
| 0x1FFFFFFFF |
220+
---------------------
221+
| 0x200000000 |
222+
| CXL CONT. |
223+
| 0x23FFFFFFF |
224+
---------------------
225+
226+
Assuming this is provided by a single device attached directly to a host bridge,
227+
Linux would expect the following decoder programming ::
228+
229+
----------------------- -----------------------
230+
| root-decoder-0 | | root-decoder-1 |
231+
| base: 0x100000000 | | base: 0x200000000 |
232+
| size: 0xC0000000 | | size: 0x40000000 |
233+
----------------------- -----------------------
234+
| |
235+
----------------------- -----------------------
236+
| HB-decoder-0 | | HB-decoder-1 |
237+
| base: 0x100000000 | | base: 0x200000000 |
238+
| size: 0xC0000000 | | size: 0x40000000 |
239+
----------------------- -----------------------
240+
| |
241+
----------------------- -----------------------
242+
| ep-decoder-0 | | ep-decoder-1 |
243+
| base: 0x100000000 | | base: 0x200000000 |
244+
| size: 0xC0000000 | | size: 0x40000000 |
245+
----------------------- -----------------------
246+
247+
With a CEDT configuration with two CFMWS describing the above root decoders.
248+
249+
Linux makes no guarantee of support for strange memory hole situations.
250+
251+
Multi-Media Devices
252+
-------------------
253+
The CFMWS field of the CEDT has special restriction bits which describe whether
254+
the described memory region allows volatile or persistent memory (or both). If
255+
the platform intends to support either:
256+
257+
1) A device with multiple medias, or
258+
2) Using a persistent memory device as normal memory
259+
260+
A platform may wish to create multiple CEDT CFMWS entries to describe the same
261+
memory, with the intent of allowing the end user flexibility in how that memory
262+
is configured. Linux does not presently have strong requirements in this area.

0 commit comments

Comments
 (0)