Skip to content

Commit 34a39b9

Browse files
authored
config-linux: add intelRdt.enableMonitoring (#1287)
Add a parameter for enabling per-container resctrl monitoring. This supersedes and replaces the previous "enableCMT" and "enableMBM" settings whose functionality was very vaguely specified. Separate parameter for every monitoring metric does not seem to make much sense, in particular because in the resctrl filesystem it is not possible to selectively enable a subset of the monitoring features. You always get all the metrics that the system provides. Also, with separate settings (and corresponding check if the specific metric is available) the user cannot specify "enable whatever is available" - setting everything to "true" might fail because one of the metrics is not available on the platform. In addition, having separate parameters is very future-unproof, making support for new monitoring metrics unnecessarily cumbersome to add. New metrics are certain to be added in new hardware generations, e.g. perf/energy monitoring in the near future (https://lkml.org/lkml/2025/5/21/1631), and requiring an update to the runtime-spec for each one of them feels like an overkill without much benefits. It is easier to have one switch for "enable container-specific metrics" and let the user read whatever metrics the platform provides. Moreover, it is not even possible to turn off monitoring (from the resctrl filesystem). For example, you always get the metrics for all CTRL_MON groups (closIDs). However, that is not always very useful as there likely are a lot of applications packed in the same group. The new intelRdt.enableMontoring parameter will enable creation of a MON group specific to a single container allowing monitoring of resctrl metrics on per-container granularity. Signed-off-by: Markus Lehtonen <[email protected]>
1 parent 82cca47 commit 34a39b9

File tree

3 files changed

+19
-18
lines changed

3 files changed

+19
-18
lines changed

config-linux.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -748,6 +748,7 @@ The following parameters can be specified for the container:
748748
* **`memBwSchema`** *(string, OPTIONAL)* - specifies the schema of memory bandwidth per L3 cache id.
749749
The value MUST start with `MB:` and MUST NOT contain newlines.
750750
* **`schemata`** *(array of strings, OPTIONAL)* - specifies the schemata to be written to the `schemata` file in resctrlfs. Each element represents one line in the `schemata` file. The value MUST NOT contain newlines.
751+
* **`enableMonitoring`** *(boolean, OPTIONAL)* - enables resctrl monitoring for the container.
751752

752753
The following rules on parameters MUST be applied:
753754

@@ -769,13 +770,20 @@ The following rules on parameters MUST be applied:
769770

770771
* If `closID` is set, and none of `l3CacheSchema`, `memBwSchema` or `schemata` are set, runtime MUST check if corresponding pre-configured directory `closID` is present in mounted `resctrl`. If such pre-configured directory `closID` exists, runtime MUST assign container to this `closID` and [generate an error](runtime.md#errors) if directory does not exist.
771772

772-
* **`enableCMT`** *(boolean, OPTIONAL)* - specifies if Intel RDT CMT should be enabled:
773-
* CMT (Cache Monitoring Technology) supports monitoring of the last-level cache (LLC) occupancy
774-
for the container.
775-
776-
* **`enableMBM`** *(boolean, OPTIONAL)* - specifies if Intel RDT MBM should be enabled:
777-
* MBM (Memory Bandwidth Monitoring) supports monitoring of total and local memory bandwidth
778-
for the container.
773+
* If `enableMonitoring` is set, the runtime MUST create a dedicated MON group
774+
for the container. The runtime MUST use the container ID from
775+
[`start`](runtime.md#start) as the name of the MON group, i.e. create
776+
`mon_groups/<container-id>/` subdirectory under the top-level CTRL_MON group
777+
(named after `closID` or `<container-id>`, see above). The runtime MUST
778+
delete the MON group after the container is deleted. If creation of the MON
779+
group fails (e.g. the maximum number of MON groups is reached) the runtime MUST
780+
return an error.
781+
782+
> **NOTE:** The `enableCMT` and `enableMBM` parameters, available in runtime-spec versions v1.1.0 through v1.2.1, were
783+
> replaced with a unified `enableMonitoring` parameter in v1.3.0. Their semantics were loosely defined and there were
784+
> no known implementations. More critically, these parameters were problematic as hardware does not support selective
785+
> enabling of individual monitoring features. This scheme also made it unnecessarily complex to add support for new
786+
> monitoring features, without providing any recognized benefits.
779787
780788
### Example
781789

schema/config-linux.json

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -278,10 +278,7 @@
278278
"type": "string",
279279
"pattern": "^MB:[^\\n]*$"
280280
},
281-
"enableCMT": {
282-
"type": "boolean"
283-
},
284-
"enableMBM": {
281+
"enableMonitoring": {
285282
"type": "boolean"
286283
}
287284
}

specs-go/config.go

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -855,13 +855,9 @@ type LinuxIntelRdt struct {
855855
// NOTE: Should not be specified if Schemata is non-empty.
856856
MemBwSchema string `json:"memBwSchema,omitempty"`
857857

858-
// EnableCMT is the flag to indicate if the Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of
859-
// the last-level cache (LLC) occupancy for the container.
860-
EnableCMT bool `json:"enableCMT,omitempty"`
861-
862-
// EnableMBM is the flag to indicate if the Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of
863-
// total and local memory bandwidth for the container.
864-
EnableMBM bool `json:"enableMBM,omitempty"`
858+
// EnableMonitoring enables resctrl monitoring for the container. This will
859+
// create a dedicated resctrl monitoring group for the container.
860+
EnableMonitoring bool `json:"enableMonitoring,omitempty"`
865861
}
866862

867863
// ZOS contains platform-specific configuration for z/OS based containers.

0 commit comments

Comments
 (0)