You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- explicit values file for privileged direct method,
- hide (into docs directory) "unprivileged" direct method (and fixes),
- remove unnessesary mounts (mcfg, /dev/cpu/dev/mem for privileged access),
- add instructions to collection methods,
- fixes (extra builder) for build local development image,
- silent mode
- move collection methods to the top
Copy file name to clipboardExpand all lines: deployment/pcm/README.md
+42-15Lines changed: 42 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,12 +4,21 @@ Helm chart instructions
4
4
5
5
### Features:
6
6
7
-
- Configurable as non-privileged container (value: `privileged=false` / default) and privileged container,
8
-
- Support for bare-metal and VM host configurations (files: [values-metal.yaml](values-metal.yaml), [values-vm.yaml](values-metal.yaml)),
7
+
- Configurable as non-privileged container (value: `privileged=false`, default) and privileged container,
8
+
- Support for bare-metal and VM host configurations (files: [values-metal.yaml](values-metal.yaml), [values-vm.yaml](values-vm.yaml)),
9
9
- Ability to deploy multiple releases alongside configured differently to handle different kinds of machines (bare-metal, VM) at the [same time](#heterogeneous-mixed-vmmetal-instances-cluster),
10
-
- Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: [values-indirect.yaml](values-indirect.yaml)).
11
10
- Linux Watchdog handling (controlled with `PCM_KEEP_NMI_WATCHDOG`, `PCM_NO_AWS_WORKAROUND`, `nmiWatchdogMount` values).
12
11
- Deploy to own namespace with "helm install ... **-n pcm --create-namespace**"
12
+
- Silent mode (value: `silent=false`, default)
13
+
14
+
Here are available methods in this chart of metrics collection w.r.t interfaces and required access:
| unprivileged "indirect" | perf, resctrl | v | recommended, missing metrics: energy metrics (TODO link to issues/PR or node_exporter/rapl_collector) |`helm install . pcm`|
19
+
| privileged "indirect" | perf, resctrl || not recommended, unsecure, no advantages over unprivileged), missing metrics: energy metrics |`helm install . pcm --set privileged=true`|
20
+
| privileged "direct" | msr || not recommended, unsecure and requires msr module pre loaded on host |`helm install . pcm -f values-direct-privileged.yaml`|
21
+
| unprivileged "direct" | msr || not recommended, requires msr module and access to /dev/cpu and /dev/mem (non trivial, like using 3rd plugins) |[link for detailed documentation](docs/direct-unprivileged-deployment.md)|
13
22
14
23
For more information about direct/indirect collection methods please see [here](#metric-collection-methods-capabilites-vs-requirements)
- Full set of metrics (uncore/UPI, RDT, energy) requires bare-metal or .metal cloud instance.
50
-
- /sys/fs/resctrl has to be mounted on host OS (for default indirect deployment method),
59
+
- /sys/fs/resctrl has to be mounted on host OS (for default indirect deployment method)
51
60
- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace in other words: Pod Security Standards allow to run on privileged level,
52
61
53
62
```
@@ -78,12 +87,14 @@ More information here: https://kubernetes.io/docs/tutorials/security/ns-level-ps
78
87
#### 1) (Optionally) mount resctrl filesystem (for RDT metrics) to unload "msr" kernel module for validation
79
88
80
89
```
90
+
echo 0 > /proc/sys/kernel/perf_event_paranoid
81
91
mount -t resctrl resctrl /sys/fs/resctrl
82
92
```
83
93
84
-
For validation to verify that all metrics are available without msr, unload "msr" module from kernel:
94
+
For validation to verify that all metrics are available without msr, unload "msr" module from kernel and perf_event_paranoid has default value
85
95
```
86
96
rmmod msr
97
+
echo 2 > /proc/sys/kernel/perf_event_paranoid
87
98
```
88
99
89
100
#### 2) Create kind based Kubernetes cluster
@@ -123,11 +134,24 @@ bash kind-with-registry.sh
123
134
Check that resctrl is available inside kind node:
124
135
```
125
136
docker exec kind-control-plane ls /sys/fs/resctrl/info
137
+
# expected output:
138
+
# L3_MON
139
+
# MB
140
+
# ...
141
+
```
142
+
143
+
144
+
and optionally local registry is running (to be used with local pcm build images, more detail [below](development-with-local-images-and-testing))
145
+
```
146
+
docker ps | grep kind-registry
147
+
# expected output:
148
+
# e57529be23ea registry:2 "/entrypoint.sh /etc…" 3 weeks ago Up 3 weeks 127.0.0.1:5001->5000/tcp kind-registry
126
149
```
127
150
128
151
Export kind kubeconfig as default for further kubectl commands:
# Note: Warning: we're using patched Dockerfile (TODO to be removed, because "build" directory conflits with existing root "build" directory and for caching ability)
0 commit comments