@@ -158,32 +158,38 @@ init process will block waiting for the parent to finish setup.
158158### IntelRdt
159159
160160Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
161- Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
162- two sub-features of RDT.
161+ Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT),
162+ Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are
163+ four sub-features of RDT.
163164
164165Cache Allocation Technology (CAT) provides a way for the software to restrict
165166cache allocation to a defined 'subset' of L3 cache which may be overlapping
166167with other 'subsets'. The different subsets are identified by class of
167168service (CLOS) and each CLOS has a capacity bitmask (CBM).
168169
170+ Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy
171+ for each running thread simultaneously.
172+
169173Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
170174over memory bandwidth for the software. A user controls the resource by
171- indicating the percentage of maximum memory bandwidth or memory bandwidth limit
172- in MBps unit if MBA Software Controller is enabled.
175+ indicating the percentage of maximum memory bandwidth or memory bandwidth
176+ limit in MBps unit if MBA Software Controller is enabled.
177+
178+ Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth
179+ for each running thread simultaneously.
173180
174- It can be used to handle L3 cache and memory bandwidth resources allocation
175- for containers if hardware and kernel support Intel RDT CAT and MBA features.
181+ More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3
182+ of Intel Software Developer Manual:
183+ https://software.intel.com/en-us/articles/intel-sdm
176184
177- In Linux 4.10 kernel or newer, the interface is defined and exposed via
185+ About Intel RDT kernel interface:
186+ In Linux 4.14 kernel or newer, the interface is defined and exposed via
178187"resource control" filesystem, which is a "cgroup-like" interface.
179188
180189Comparing with cgroups, it has similar process management lifecycle and
181190interfaces in a container. But unlike cgroups' hierarchy, it has single level
182191filesystem layout.
183192
184- CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
185- "resource control" filesystem.
186-
187193Intel RDT "resource control" filesystem hierarchy:
188194```
189195mount -t resctrl resctrl /sys/fs/resctrl
@@ -194,25 +200,46 @@ tree /sys/fs/resctrl
194200| | |-- cbm_mask
195201| | |-- min_cbm_bits
196202| | |-- num_closids
203+ | |-- L3_MON
204+ | | |-- max_threshold_occupancy
205+ | | |-- mon_features
206+ | | |-- num_rmids
197207| |-- MB
198208| |-- bandwidth_gran
199209| |-- delay_linear
200210| |-- min_bandwidth
201211| |-- num_closids
202- |-- ...
212+ |-- mon_groups
213+ |-- <rmid>
214+ |-- ...
215+ |-- mon_data
216+ |-- mon_L3_00
217+ |-- llc_occupancy
218+ |-- mbm_local_bytes
219+ |-- mbm_total_bytes
220+ |-- ...
221+ |-- tasks
203222|-- schemata
204223|-- tasks
205- |-- <container_id >
224+ |-- <clos >
206225 |-- ...
207- |-- schemata
226+ |-- mon_data
227+ |-- mon_L3_00
228+ |-- llc_occupancy
229+ |-- mbm_local_bytes
230+ |-- mbm_total_bytes
231+ |-- ...
208232 |-- tasks
233+ |-- schemata
234+ |-- ...
209235```
210236
211237For runc, we can make use of ` tasks ` and ` schemata ` configuration for L3
212- cache and memory bandwidth resources constraints.
238+ cache and memory bandwidth resources constraints, ` mon_data ` directory for
239+ CMT and MBM statistics.
213240
214241The file ` tasks ` has a list of tasks that belongs to this group (e.g.,
215- <container_id >" group). Tasks can be added to a group by writing the task ID
242+ "< clos >" group). Tasks can be added to a group by writing the task ID
216243to the "tasks" file (which will automatically remove them from the previous
217244group to which they belonged). New tasks created by fork(2) and clone(2) are
218245added to the same group as their parent.
@@ -224,7 +251,7 @@ L3 cache schema:
224251It has allocation bitmasks/values for L3 cache on each socket, which
225252contains L3 cache id and capacity bitmask (CBM).
226253```
227- Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
254+ Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
228255```
229256For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
230257which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
@@ -240,7 +267,7 @@ Memory bandwidth schema:
240267It has allocation values for memory bandwidth on each socket, which contains
241268L3 cache id and memory bandwidth.
242269```
243- Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
270+ Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
244271```
245272For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
246273
@@ -251,8 +278,10 @@ that is allocated is also dependent on the CPU model and can be looked up at
251278min_bw + N * bw_gran. Intermediate values are rounded to the next control
252279step available on the hardware.
253280
254- If MBA Software Controller is enabled through mount option "-o mba_MBps"
281+ If MBA Software Controller is enabled through mount option "-o mba_MBps":
282+ ```
255283mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
284+ ```
256285We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
257286instead of "percentages". The kernel underneath would use a software feedback
258287mechanism or a "Software Controller" which reads the actual bandwidth using
@@ -263,11 +292,12 @@ For example, on a two-socket machine, the schema line could be
263292"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
264293and 7000 MBps memory bandwidth limit on socket 1.
265294
266- For more information about Intel RDT kernel interface:
295+ For more information about Intel RDT kernel interface:
267296https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
268297
269- ```
298+
270299An example for runc:
300+ ```
271301Consider a two-socket machine with two L3 caches where the default CBM is
2723020x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
273303with a memory bandwidth granularity of 10%.
@@ -281,7 +311,17 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
281311 "closID": "guaranteed_group",
282312 "l3CacheSchema": "L3:0=7f0;1=1f",
283313 "memBwSchema": "MB:0=20;1=70"
284- }
314+ }
315+ }
316+ ```
317+ Another example:
318+ ```
319+ We only want to monitor memory bandwidth and llc occupancy.
320+ "linux": {
321+ "intelRdt": {
322+ "enableMBM": true,
323+ "enableCMT": true
324+ }
285325}
286326```
287327
0 commit comments