Skip to content

Commit 2c70677

Browse files
committed
cxl: Add documentation to explain the shared link bandwidth calculation
Create a kernel documentation to describe how the CXL shared upstream link bandwidth is calculated. Suggested-by: Dan Williams <[email protected]> Reviewed-by: Alison Schofield <[email protected]> Acked-by: Dan Williams <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Dave Jiang <[email protected]>
1 parent a5ab0de commit 2c70677

File tree

2 files changed

+92
-0
lines changed

2 files changed

+92
-0
lines changed
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
.. include:: <isonum.txt>
3+
4+
==================================
5+
CXL Access Coordinates Computation
6+
==================================
7+
8+
Shared Upstream Link Calculation
9+
================================
10+
For certain CXL region construction with endpoints behind CXL switches (SW) or
11+
Root Ports (RP), there is the possibility of the total bandwidth for all
12+
the endpoints behind a switch being more than the switch upstream link.
13+
A similar situation can occur within the host, upstream of the root ports.
14+
The CXL driver performs an additional pass after all the targets have
15+
arrived for a region in order to recalculate the bandwidths with possible
16+
upstream link being a limiting factor in mind.
17+
18+
The algorithm assumes the configuration is a symmetric topology as that
19+
maximizes performance. When asymmetric topology is detected, the calculation
20+
is aborted. An asymmetric topology is detected during topology walk where the
21+
number of RPs detected as a grandparent is not equal to the number of devices
22+
iterated in the same iteration loop. The assumption is made that subtle
23+
asymmetry in properties does not happen and all paths to EPs are equal.
24+
25+
There can be multiple switches under an RP. There can be multiple RPs under
26+
a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory
27+
Window Structure (CFMWS).
28+
29+
An example hierarchy:
30+
31+
> CFMWS 0
32+
> |
33+
> _________|_________
34+
> | |
35+
> ACPI0017-0 ACPI0017-1
36+
> GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1
37+
> | | | |
38+
> RP0 RP1 RP2 RP3
39+
> | | | |
40+
> SW 0 SW 1 SW 2 SW 3
41+
> | | | | | | | |
42+
> EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7
43+
44+
Computation for the example hierarchy:
45+
46+
Min (GP0 to CPU BW,
47+
Min(SW 0 Upstream Link to RP0 BW,
48+
Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) +
49+
Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) +
50+
Min(SW 1 Upstream Link to RP1 BW,
51+
Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) +
52+
Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) +
53+
Min (GP1 to CPU BW,
54+
Min(SW 2 Upstream Link to RP2 BW,
55+
Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) +
56+
Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) +
57+
Min(SW 3 Upstream Link to RP3 BW,
58+
Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) +
59+
Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link))))
60+
61+
The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray
62+
is created to collect all the endpoint bandwidths via the
63+
cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the
64+
endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint
65+
has a CXL switch as a parent, then min() of calculated bandwidth and the
66+
bandwidth from the SSLBIS for the switch downstream port that is associated
67+
with the endpoint is calculated. The final bandwidth is stored in a
68+
'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the
69+
endpoint is direct attached to a root port (RP), the device pointer would be an
70+
RP device. If the endpoint is behind a switch, the device pointer would be the
71+
upstream device of the parent switch.
72+
73+
At the next stage, the code walks through one or more switches if they exist
74+
in the topology. For endpoints directly attached to RPs, this step is skipped.
75+
If there is another switch upstream, the code takes the min() of the current
76+
gathered bandwidth and the upstream link bandwidth. If there's a switch
77+
upstream, then the SSLBIS of the upstream switch.
78+
79+
Once the topology walk reaches the RP, whether it's direct attached endpoints
80+
or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At
81+
this point all the bandwidths are aggregated per each host bridge, which is
82+
also the index for the resulting xarray.
83+
84+
The next step is to take the min() of the per host bridge bandwidth and the
85+
bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved
86+
via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same
87+
ACPI0017 device to form a new xarray.
88+
89+
Finally, the cxl_region_update_bandwidth() is called and the aggregated
90+
bandwidth from all the members of the last xarray is updated for the
91+
access coordinates residing in the cxl region (cxlr) context.

Documentation/driver-api/cxl/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Compute Express Link
88
:maxdepth: 1
99

1010
memory-devices
11+
access-coordinates
1112

1213
maturity-map
1314

0 commit comments

Comments
 (0)