Skip to content

Commit 775a366

Browse files
Paweł SzulikCreatone
authored andcommitted
libcontainer/intelrdt: add basic "MON" groups support.
More info: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt Signed-off-by: Paweł Szulik <[email protected]>
1 parent 1198389 commit 775a366

File tree

12 files changed

+338
-72
lines changed

12 files changed

+338
-72
lines changed

libcontainer/SPEC.md

Lines changed: 61 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -158,32 +158,38 @@ init process will block waiting for the parent to finish setup.
158158
### IntelRdt
159159

160160
Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
161-
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
162-
two sub-features of RDT.
161+
Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT),
162+
Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are
163+
four sub-features of RDT.
163164

164165
Cache Allocation Technology (CAT) provides a way for the software to restrict
165166
cache allocation to a defined 'subset' of L3 cache which may be overlapping
166167
with other 'subsets'. The different subsets are identified by class of
167168
service (CLOS) and each CLOS has a capacity bitmask (CBM).
168169

170+
Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy
171+
for each running thread simultaneously.
172+
169173
Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
170174
over memory bandwidth for the software. A user controls the resource by
171-
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
172-
in MBps unit if MBA Software Controller is enabled.
175+
indicating the percentage of maximum memory bandwidth or memory bandwidth
176+
limit in MBps unit if MBA Software Controller is enabled.
177+
178+
Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth
179+
for each running thread simultaneously.
173180

174-
It can be used to handle L3 cache and memory bandwidth resources allocation
175-
for containers if hardware and kernel support Intel RDT CAT and MBA features.
181+
More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3
182+
of Intel Software Developer Manual:
183+
https://software.intel.com/en-us/articles/intel-sdm
176184

177-
In Linux 4.10 kernel or newer, the interface is defined and exposed via
185+
About Intel RDT kernel interface:
186+
In Linux 4.14 kernel or newer, the interface is defined and exposed via
178187
"resource control" filesystem, which is a "cgroup-like" interface.
179188

180189
Comparing with cgroups, it has similar process management lifecycle and
181190
interfaces in a container. But unlike cgroups' hierarchy, it has single level
182191
filesystem layout.
183192

184-
CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
185-
"resource control" filesystem.
186-
187193
Intel RDT "resource control" filesystem hierarchy:
188194
```
189195
mount -t resctrl resctrl /sys/fs/resctrl
@@ -194,25 +200,46 @@ tree /sys/fs/resctrl
194200
| | |-- cbm_mask
195201
| | |-- min_cbm_bits
196202
| | |-- num_closids
203+
| |-- L3_MON
204+
| | |-- max_threshold_occupancy
205+
| | |-- mon_features
206+
| | |-- num_rmids
197207
| |-- MB
198208
| |-- bandwidth_gran
199209
| |-- delay_linear
200210
| |-- min_bandwidth
201211
| |-- num_closids
202-
|-- ...
212+
|-- mon_groups
213+
|-- <rmid>
214+
|-- ...
215+
|-- mon_data
216+
|-- mon_L3_00
217+
|-- llc_occupancy
218+
|-- mbm_local_bytes
219+
|-- mbm_total_bytes
220+
|-- ...
221+
|-- tasks
203222
|-- schemata
204223
|-- tasks
205-
|-- <container_id>
224+
|-- <clos>
206225
|-- ...
207-
|-- schemata
226+
|-- mon_data
227+
|-- mon_L3_00
228+
|-- llc_occupancy
229+
|-- mbm_local_bytes
230+
|-- mbm_total_bytes
231+
|-- ...
208232
|-- tasks
233+
|-- schemata
234+
|-- ...
209235
```
210236

211237
For runc, we can make use of `tasks` and `schemata` configuration for L3
212-
cache and memory bandwidth resources constraints.
238+
cache and memory bandwidth resources constraints, `mon_data` directory for
239+
CMT and MBM statistics.
213240

214241
The file `tasks` has a list of tasks that belongs to this group (e.g.,
215-
<container_id>" group). Tasks can be added to a group by writing the task ID
242+
"<clos>" group). Tasks can be added to a group by writing the task ID
216243
to the "tasks" file (which will automatically remove them from the previous
217244
group to which they belonged). New tasks created by fork(2) and clone(2) are
218245
added to the same group as their parent.
@@ -224,7 +251,7 @@ L3 cache schema:
224251
It has allocation bitmasks/values for L3 cache on each socket, which
225252
contains L3 cache id and capacity bitmask (CBM).
226253
```
227-
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
254+
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
228255
```
229256
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
230257
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
@@ -240,7 +267,7 @@ Memory bandwidth schema:
240267
It has allocation values for memory bandwidth on each socket, which contains
241268
L3 cache id and memory bandwidth.
242269
```
243-
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
270+
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
244271
```
245272
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"
246273

@@ -251,8 +278,10 @@ that is allocated is also dependent on the CPU model and can be looked up at
251278
min_bw + N * bw_gran. Intermediate values are rounded to the next control
252279
step available on the hardware.
253280

254-
If MBA Software Controller is enabled through mount option "-o mba_MBps"
281+
If MBA Software Controller is enabled through mount option "-o mba_MBps":
282+
```
255283
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
284+
```
256285
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
257286
instead of "percentages". The kernel underneath would use a software feedback
258287
mechanism or a "Software Controller" which reads the actual bandwidth using
@@ -263,11 +292,12 @@ For example, on a two-socket machine, the schema line could be
263292
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
264293
and 7000 MBps memory bandwidth limit on socket 1.
265294

266-
For more information about Intel RDT kernel interface:
295+
For more information about Intel RDT kernel interface:
267296
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
268297

269-
```
298+
270299
An example for runc:
300+
```
271301
Consider a two-socket machine with two L3 caches where the default CBM is
272302
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
273303
with a memory bandwidth granularity of 10%.
@@ -281,7 +311,17 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
281311
"closID": "guaranteed_group",
282312
"l3CacheSchema": "L3:0=7f0;1=1f",
283313
"memBwSchema": "MB:0=20;1=70"
284-
}
314+
}
315+
}
316+
```
317+
Another example:
318+
```
319+
We only want to monitor memory bandwidth and llc occupancy.
320+
"linux": {
321+
"intelRdt": {
322+
"enableMBM": true,
323+
"enableCMT": true
324+
}
285325
}
286326
```
287327

libcontainer/configs/config.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@ type Config struct {
197197
NoNewKeyring bool `json:"no_new_keyring"`
198198

199199
// IntelRdt specifies settings for Intel RDT group that the container is placed into
200-
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available
200+
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available.
201201
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`
202202

203203
// RootlessEUID is set when the runc was launched with non-zero EUID.

libcontainer/configs/intelrdt.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,12 @@ type IntelRdt struct {
1313
// The unit of memory bandwidth is specified in "percentages" by
1414
// default, and in "MBps" if MBA Software Controller is enabled.
1515
MemBwSchema string `json:"memBwSchema,omitempty"`
16+
17+
// The flag to indicate if Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of
18+
// the last-level cache (LLC) occupancy for the container.
19+
EnableCMT bool `json:"enableCMT,omitempty"`
20+
21+
// The flag to indicate if Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of
22+
// total and local memory bandwidth for the container.
23+
EnableMBM bool `json:"enableMBM,omitempty"`
1624
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
package configs_test
2+
3+
import (
4+
"encoding/json"
5+
"reflect"
6+
"testing"
7+
8+
"github.com/opencontainers/runc/libcontainer/configs"
9+
)
10+
11+
func TestUnmarshalIntelRDT(t *testing.T) {
12+
testCases := []struct {
13+
JSON string
14+
Expected configs.IntelRdt
15+
}{
16+
{
17+
"{\"enableMBM\": true}",
18+
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
19+
},
20+
{
21+
"{\"enableMBM\": true,\"enableCMT\": false}",
22+
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
23+
},
24+
{
25+
"{\"enableMBM\": false,\"enableCMT\": true}",
26+
configs.IntelRdt{EnableMBM: false, EnableCMT: true},
27+
},
28+
}
29+
30+
for _, tc := range testCases {
31+
got := configs.IntelRdt{}
32+
33+
err := json.Unmarshal([]byte(tc.JSON), &got)
34+
if err != nil {
35+
t.Fatal(err)
36+
}
37+
38+
if !reflect.DeepEqual(tc.Expected, got) {
39+
t.Errorf("expected unmarshalled IntelRDT config %+v, got %+v", tc.Expected, got)
40+
}
41+
}
42+
}

libcontainer/configs/validate/validator.go

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -219,12 +219,18 @@ func intelrdtCheck(config *configs.Config) error {
219219
return fmt.Errorf("invalid intelRdt.ClosID %q", config.IntelRdt.ClosID)
220220
}
221221

222-
if !intelrdt.IsCATEnabled() && config.IntelRdt.L3CacheSchema != "" {
222+
if config.IntelRdt.L3CacheSchema != "" && !intelrdt.IsCATEnabled() {
223223
return errors.New("intelRdt.l3CacheSchema is specified in config, but Intel RDT/CAT is not enabled")
224224
}
225-
if !intelrdt.IsMBAEnabled() && config.IntelRdt.MemBwSchema != "" {
225+
if config.IntelRdt.MemBwSchema != "" && !intelrdt.IsMBAEnabled() {
226226
return errors.New("intelRdt.memBwSchema is specified in config, but Intel RDT/MBA is not enabled")
227227
}
228+
if config.IntelRdt.EnableCMT && !intelrdt.IsCMTEnabled() {
229+
return errors.New("intelRdt.enableCMT is specified in config, but Intel RDT/CMT is not enabled")
230+
}
231+
if config.IntelRdt.EnableMBM && !intelrdt.IsMBMEnabled() {
232+
return errors.New("intelRdt.enableMBM is specified in config, but Intel RDT/MBM is not enabled")
233+
}
228234
}
229235

230236
return nil

libcontainer/container_linux.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2009,6 +2009,7 @@ func (c *Container) currentState() (*State, error) {
20092009
if c.intelRdtManager != nil {
20102010
intelRdtPath = c.intelRdtManager.GetPath()
20112011
}
2012+
20122013
state := &State{
20132014
BaseState: BaseState{
20142015
ID: c.ID(),

0 commit comments

Comments
 (0)