-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
The descriptions of Intel RDT/MBA features, user cases and Linux kernel interface are
heavily based on the Intel RDT documentation of the Linux kernel:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
Thanks to the authors of the kernel patches:
* Vikas Shivappa <[email protected]>
* Fenghua Yu <[email protected]>
* Tony Luck <[email protected]>
Status: Intel RDT/MBA support for OCI and Docker software stack
Intel RDT/MBA support in OCI (merged PRs):
1. Intel RDT/MBA support in OCI/runtime-spec
opencontainers/runtime-spec#932
2. Intel RDT/MBA support in OCI/runc
3. Intel RDT/MBA Software Controller support in OCI/runtime-spec
opencontainers/runtime-spec#992
4. Intel RDT/MBA Software Controller support in OCI/runc
TODO list - Intel RDT/MBA support in Docker:
3. Intel RDT/MBA support in containerd
4. Intel RDT/MBA support in Docker Engine (moby/moby)
5. Intel RDT/MBA support in Docker CLI
What is Intel RDT/MBA:
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT). And Cache Allocation Technology (CAT) is another one. Please refer to the details of Intel RDT and Cache Allocation Technology (CAT) support for runc and Docker in #433 .
MBA hardware details could be found in the section 17.18 of Intel Software Developer Manual and Intel RDT Homepage.
MBA provides indirect and approximate throttle over memory bandwidth (b/w) for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth or memory bandwidth limit in MBps unit if MBA Software Controller is enabled (#1919).
Linux kernel interface for Intel RDT/MBA:
In Linux 4.12 kernel and newer, Intel RDT/MBA is supported on some Intel Xeon platforms with kernel config CONFIG_INTEL_RDT. In Linux 5.1 kernel and newer, with kernel config CONFIG_X86_CPU_RESCTRL.
To check if MBA is enabled:
$ cat /proc/cpuinfo
Check if output have 'rdt_a' and 'mba' flags.
The Intel RDT kernel interface is documented as below, MBA and CAT make use of the same interface.
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
| |-- L3
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
|-- ...
|-- schemata
|-- tasks
For MBA support for runc, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279 . We could also make use of tasks and schemata configuration for memory b/w resource constraints.
The file tasks has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent.
The file schemata has a list of all the resources available to this group. Each resource (L3 cache, memory b/w) has its own line and format.
Memory b/w is per L3 cache domain. The schema format:
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The examples for runc:
For example on a two-socket machine with two L3 caches where the minimum memory b/w of 10%
with a memory b/w granularity of 10%. Tasks inside the container may use a maximum memory
b/w of 20% on socket 0 and 70% on socket 1.
"linux": {
"intelRdt": {
"memBwSchema": "MB:0=20;1=70"
}
}
If MBA Software Controller is enabled through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl`
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit instead of "percentages". The kernel underneath would use a software feedback mechanism or a "Software Controller" which reads the actual bandwidth using MBM counters and adjust the memory bandwidth percentages to ensure: "actual memory bandwidth < user specified memory bandwidth".
For example, on a two-socket machine, the schema line could be "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0 and 7000 MBps memory bandwidth limit on socket 1.
"linux": {
"intelRdt": {
"memBwSchema": "MB:0=5000;1=7000"
}
}