Skip to content

Commit aa8c3db

Browse files
committed
Merge tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov: - Add support for a new AMD feature called slow memory bandwidth allocation. Its goal is to control resource allocation in external slow memory which is connected to the machine like for example through CXL devices, accelerators etc * tag 'x86_cache_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Fix a silly -Wunused-but-set-variable warning Documentation/x86: Update resctrl.rst for new features x86/resctrl: Add interface to write mbm_local_bytes_config x86/resctrl: Add interface to write mbm_total_bytes_config x86/resctrl: Add interface to read mbm_local_bytes_config x86/resctrl: Add interface to read mbm_total_bytes_config x86/resctrl: Support monitor configuration x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation x86/resctrl: Include new features in command line options x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
2 parents 1adce1b + 793207b commit aa8c3db

File tree

12 files changed

+559
-41
lines changed

12 files changed

+559
-41
lines changed

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5221,7 +5221,7 @@
52215221
rdt= [HW,X86,RDT]
52225222
Turn on/off individual RDT features. List is:
52235223
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
5224-
mba.
5224+
mba, smba, bmec.
52255225
E.g. to turn on cmt and turn off mba use:
52265226
rdt=cmt,!mba
52275227

Documentation/x86/resctrl.rst

Lines changed: 145 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,21 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
1717
This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
1818
flag bits:
1919

20-
============================================= ================================
20+
=============================================== ================================
2121
RDT (Resource Director Technology) Allocation "rdt_a"
2222
CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
2323
CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
2424
CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
2525
MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
2626
MBA (Memory Bandwidth Allocation) "mba"
27-
============================================= ================================
27+
SMBA (Slow Memory Bandwidth Allocation) ""
28+
BMEC (Bandwidth Monitoring Event Configuration) ""
29+
=============================================== ================================
30+
31+
Historically, new features were made visible by default in /proc/cpuinfo. This
32+
resulted in the feature flags becoming hard to parse by humans. Adding a new
33+
flag to /proc/cpuinfo should be avoided if user space can obtain information
34+
about the feature from resctrl's info directory.
2835

2936
To use the feature mount the file system::
3037

@@ -161,6 +168,83 @@ with the following files:
161168
"mon_features":
162169
Lists the monitoring events if
163170
monitoring is enabled for the resource.
171+
Example::
172+
173+
# cat /sys/fs/resctrl/info/L3_MON/mon_features
174+
llc_occupancy
175+
mbm_total_bytes
176+
mbm_local_bytes
177+
178+
If the system supports Bandwidth Monitoring Event
179+
Configuration (BMEC), then the bandwidth events will
180+
be configurable. The output will be::
181+
182+
# cat /sys/fs/resctrl/info/L3_MON/mon_features
183+
llc_occupancy
184+
mbm_total_bytes
185+
mbm_total_bytes_config
186+
mbm_local_bytes
187+
mbm_local_bytes_config
188+
189+
"mbm_total_bytes_config", "mbm_local_bytes_config":
190+
Read/write files containing the configuration for the mbm_total_bytes
191+
and mbm_local_bytes events, respectively, when the Bandwidth
192+
Monitoring Event Configuration (BMEC) feature is supported.
193+
The event configuration settings are domain specific and affect
194+
all the CPUs in the domain. When either event configuration is
195+
changed, the bandwidth counters for all RMIDs of both events
196+
(mbm_total_bytes as well as mbm_local_bytes) are cleared for that
197+
domain. The next read for every RMID will report "Unavailable"
198+
and subsequent reads will report the valid value.
199+
200+
Following are the types of events supported:
201+
202+
==== ========================================================
203+
Bits Description
204+
==== ========================================================
205+
6 Dirty Victims from the QOS domain to all types of memory
206+
5 Reads to slow memory in the non-local NUMA domain
207+
4 Reads to slow memory in the local NUMA domain
208+
3 Non-temporal writes to non-local NUMA domain
209+
2 Non-temporal writes to local NUMA domain
210+
1 Reads to memory in the non-local NUMA domain
211+
0 Reads to memory in the local NUMA domain
212+
==== ========================================================
213+
214+
By default, the mbm_total_bytes configuration is set to 0x7f to count
215+
all the event types and the mbm_local_bytes configuration is set to
216+
0x15 to count all the local memory events.
217+
218+
Examples:
219+
220+
* To view the current configuration::
221+
::
222+
223+
# cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
224+
0=0x7f;1=0x7f;2=0x7f;3=0x7f
225+
226+
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
227+
0=0x15;1=0x15;3=0x15;4=0x15
228+
229+
* To change the mbm_total_bytes to count only reads on domain 0,
230+
the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
231+
(in hexadecimal 0x33):
232+
::
233+
234+
# echo "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
235+
236+
# cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
237+
0=0x33;1=0x7f;2=0x7f;3=0x7f
238+
239+
* To change the mbm_local_bytes to count all the slow memory reads on
240+
domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
241+
in binary (in hexadecimal 0x30):
242+
::
243+
244+
# echo "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
245+
246+
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
247+
0=0x30;1=0x30;3=0x15;4=0x15
164248

165249
"max_threshold_occupancy":
166250
Read/write file provides the largest value (in
@@ -464,6 +548,25 @@ Memory bandwidth domain is L3 cache.
464548

465549
MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
466550

551+
Slow Memory Bandwidth Allocation (SMBA)
552+
---------------------------------------
553+
AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
554+
CXL.memory is the only supported "slow" memory device. With the
555+
support of SMBA, the hardware enables bandwidth allocation on
556+
the slow memory devices. If there are multiple such devices in
557+
the system, the throttling logic groups all the slow sources
558+
together and applies the limit on them as a whole.
559+
560+
The presence of SMBA (with CXL.memory) is independent of slow memory
561+
devices presence. If there are no such devices on the system, then
562+
configuring SMBA will have no impact on the performance of the system.
563+
564+
The bandwidth domain for slow memory is L3 cache. Its schemata file
565+
is formatted as:
566+
::
567+
568+
SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
569+
467570
Reading/writing the schemata file
468571
---------------------------------
469572
Reading the schemata file will show the state of all resources
@@ -479,6 +582,46 @@ which you wish to change. E.g.
479582
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
480583
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
481584

585+
Reading/writing the schemata file (on AMD systems)
586+
--------------------------------------------------
587+
Reading the schemata file will show the current bandwidth limit on all
588+
domains. The allocated resources are in multiples of one eighth GB/s.
589+
When writing to the file, you need to specify what cache id you wish to
590+
configure the bandwidth limit.
591+
592+
For example, to allocate 2GB/s limit on the first cache id:
593+
594+
::
595+
596+
# cat schemata
597+
MB:0=2048;1=2048;2=2048;3=2048
598+
L3:0=ffff;1=ffff;2=ffff;3=ffff
599+
600+
# echo "MB:1=16" > schemata
601+
# cat schemata
602+
MB:0=2048;1= 16;2=2048;3=2048
603+
L3:0=ffff;1=ffff;2=ffff;3=ffff
604+
605+
Reading/writing the schemata file (on AMD systems) with SMBA feature
606+
--------------------------------------------------------------------
607+
Reading and writing the schemata file is the same as without SMBA in
608+
above section.
609+
610+
For example, to allocate 8GB/s limit on the first cache id:
611+
612+
::
613+
614+
# cat schemata
615+
SMBA:0=2048;1=2048;2=2048;3=2048
616+
MB:0=2048;1=2048;2=2048;3=2048
617+
L3:0=ffff;1=ffff;2=ffff;3=ffff
618+
619+
# echo "SMBA:1=64" > schemata
620+
# cat schemata
621+
SMBA:0=2048;1= 64;2=2048;3=2048
622+
MB:0=2048;1=2048;2=2048;3=2048
623+
L3:0=ffff;1=ffff;2=ffff;3=ffff
624+
482625
Cache Pseudo-Locking
483626
====================
484627
CAT enables a user to specify the amount of cache space that an

arch/x86/include/asm/cpufeatures.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,8 @@
307307
#define X86_FEATURE_SGX_EDECCSSA (11*32+18) /* "" SGX EDECCSSA user leaf function */
308308
#define X86_FEATURE_CALL_DEPTH (11*32+19) /* "" Call depth tracking for RSB stuffing */
309309
#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
310+
#define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */
311+
#define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
310312

311313
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
312314
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */

arch/x86/include/asm/msr-index.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1084,6 +1084,8 @@
10841084

10851085
/* - AMD: */
10861086
#define MSR_IA32_MBA_BW_BASE 0xc0000200
1087+
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
1088+
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
10871089

10881090
/* MSR_IA32_VMX_MISC bits */
10891091
#define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)

arch/x86/kernel/cpu/cpuid-deps.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
6868
{ X86_FEATURE_CQM_OCCUP_LLC, X86_FEATURE_CQM_LLC },
6969
{ X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC },
7070
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
71+
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_TOTAL },
72+
{ X86_FEATURE_BMEC, X86_FEATURE_CQM_MBM_LOCAL },
7173
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
7274
{ X86_FEATURE_AVX512_FP16, X86_FEATURE_AVX512BW },
7375
{ X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },

arch/x86/kernel/cpu/resctrl/core.c

Lines changed: 51 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
100100
.fflags = RFTYPE_RES_MB,
101101
},
102102
},
103+
[RDT_RESOURCE_SMBA] =
104+
{
105+
.r_resctrl = {
106+
.rid = RDT_RESOURCE_SMBA,
107+
.name = "SMBA",
108+
.cache_level = 3,
109+
.domains = domain_init(RDT_RESOURCE_SMBA),
110+
.parse_ctrlval = parse_bw,
111+
.format_str = "%d=%*u",
112+
.fflags = RFTYPE_RES_MB,
113+
},
114+
},
103115
};
104116

105117
/*
@@ -150,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
150162
if (!r)
151163
return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
152164

165+
/*
166+
* The software controller support is only applicable to MBA resource.
167+
* Make sure to check for resource type.
168+
*/
169+
if (r->rid != RDT_RESOURCE_MBA)
170+
return false;
171+
153172
return r->membw.mba_sc;
154173
}
155174

@@ -213,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
213232
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
214233
union cpuid_0x10_3_eax eax;
215234
union cpuid_0x10_x_edx edx;
216-
u32 ebx, ecx;
235+
u32 ebx, ecx, subleaf;
236+
237+
/*
238+
* Query CPUID_Fn80000020_EDX_x01 for MBA and
239+
* CPUID_Fn80000020_EDX_x02 for SMBA
240+
*/
241+
subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 : 1;
217242

218-
cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
243+
cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
219244
hw_res->num_closid = edx.split.cos_max + 1;
220245
r->default_ctrl = MAX_MBA_BW_AMD;
221246

@@ -647,6 +672,8 @@ enum {
647672
RDT_FLAG_L2_CAT,
648673
RDT_FLAG_L2_CDP,
649674
RDT_FLAG_MBA,
675+
RDT_FLAG_SMBA,
676+
RDT_FLAG_BMEC,
650677
};
651678

652679
#define RDT_OPT(idx, n, f) \
@@ -670,6 +697,8 @@ static struct rdt_options rdt_options[] __initdata = {
670697
RDT_OPT(RDT_FLAG_L2_CAT, "l2cat", X86_FEATURE_CAT_L2),
671698
RDT_OPT(RDT_FLAG_L2_CDP, "l2cdp", X86_FEATURE_CDP_L2),
672699
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
700+
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
701+
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
673702
};
674703
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
675704

@@ -699,7 +728,7 @@ static int __init set_rdt_options(char *str)
699728
}
700729
__setup("rdt", set_rdt_options);
701730

702-
static bool __init rdt_cpu_has(int flag)
731+
bool __init rdt_cpu_has(int flag)
703732
{
704733
bool ret = boot_cpu_has(flag);
705734
struct rdt_options *o;
@@ -734,6 +763,19 @@ static __init bool get_mem_config(void)
734763
return false;
735764
}
736765

766+
static __init bool get_slow_mem_config(void)
767+
{
768+
struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
769+
770+
if (!rdt_cpu_has(X86_FEATURE_SMBA))
771+
return false;
772+
773+
if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
774+
return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
775+
776+
return false;
777+
}
778+
737779
static __init bool get_rdt_alloc_resources(void)
738780
{
739781
struct rdt_resource *r;
@@ -764,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
764806
if (get_mem_config())
765807
ret = true;
766808

809+
if (get_slow_mem_config())
810+
ret = true;
811+
767812
return ret;
768813
}
769814

@@ -853,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
853898
} else if (r->rid == RDT_RESOURCE_MBA) {
854899
hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
855900
hw_res->msr_update = mba_wrmsr_amd;
901+
} else if (r->rid == RDT_RESOURCE_SMBA) {
902+
hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
903+
hw_res->msr_update = mba_wrmsr_amd;
856904
}
857905
}
858906
}

arch/x86/kernel/cpu/resctrl/ctrlmondata.c

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
209209
unsigned long dom_id;
210210

211211
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
212-
r->rid == RDT_RESOURCE_MBA) {
212+
(r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
213213
rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
214214
return -EINVAL;
215215
}
@@ -310,7 +310,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
310310
enum resctrl_conf_type t;
311311
cpumask_var_t cpu_mask;
312312
struct rdt_domain *d;
313-
int cpu;
314313
u32 idx;
315314

316315
if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
@@ -341,13 +340,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
341340

342341
if (cpumask_empty(cpu_mask))
343342
goto done;
344-
cpu = get_cpu();
345-
/* Update resource control msr on this CPU if it's in cpu_mask. */
346-
if (cpumask_test_cpu(cpu, cpu_mask))
347-
rdt_ctrl_update(&msr_param);
348-
/* Update resource control msr on other CPUs. */
349-
smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
350-
put_cpu();
343+
344+
/* Update resource control msr on all the CPUs. */
345+
on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
351346

352347
done:
353348
free_cpumask_var(cpu_mask);

0 commit comments

Comments
 (0)