Skip to content

Commit 5a4e9e0

Browse files
committed
WIP
1 parent 5f8ce37 commit 5a4e9e0

File tree

1 file changed

+29
-27
lines changed

1 file changed

+29
-27
lines changed

deployment/pcm/README.md

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ Helm chart instructions
77
- Configurable as non-privileged container (value: `privileged=false` / default) and privileged container,
88
- Support for bare-metal and VM host configurations (files: [values-metal.yaml](values-metal.yaml), [values-vm.yaml](values-metal.yaml)),
99
- Ability to deploy multiple releases alongside configured differently to handle different kinds of machines (bare-metal, VM) at the [same time](#heterogeneous-mixed-vmmetal-instances-cluster),
10-
- Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: `values-indirect.yaml`).
11-
- Linux Watchdog handling (controlled with PCM_KEEP_NMI_WATCHDOG, PCM_NO_AWS_WORKAROUND, nmiWatchdogMount values).
10+
- Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: [values-indirect.yaml](values-indirect.yaml)).
11+
- Linux Watchdog handling (controlled with `PCM_KEEP_NMI_WATCHDOG`, `PCM_NO_AWS_WORKAROUND`, `nmiWatchdogMount` values).
1212
- Deploy to own namespace with "helm install ... **-n pcm --create-namespace**"
1313

1414
#### Integration features:
1515

16-
- node-feature-discovery based nodeSelector and nodeAffinity (values: nfd, nfdBaremetalAffinity, nfdRDTAffinity),
16+
- node-feature-discovery based nodeSelector and nodeAffinity (values: `nfd`, `nfdBaremetalAffinity`, `nfdRDTAffinity`),
1717
- Examples for non-privileged mode using device plugin ("smarter-devices-manager") or using NRI device-injector plugin (TODO) (file: [values-smarter-devices-cpu-mem.yaml](values-smarter-devices-cpu-mem.yaml) ),
1818
- Integration with NRI balloons policy plugin (value: `nriBalloonsPolicyIntegration`),
1919

@@ -41,21 +41,6 @@ helm upgrade --install pcm . --set privileged=true -f values-metal.yaml -f value
4141
helm install ... --set nfd=true --set podMonitor=true
4242
```
4343

44-
#### DEBUGGING & BUILDING
45-
46-
** NOTE: DEBUGGING: TODO to be remove before merging **
47-
48-
```
49-
# Build local image for tests/development + fix /pcm/resctrl mounting (assuming project was configured with cmake previously):
50-
(cd ../.. ; (cd build ; make -j pcm pcm-sensor-server) ; docker build . -t localhost:5001/pcm-local && docker push localhost:5001/pcm-local; docker run -ti --rm --name pcmtest --entrypoint bash localhost:5001/pcm-local -c "pcm 2>&1 | head -5" )
51-
52-
# Local image "indirect"
53-
helm upgrade --install pcm . --set debugPcm=true
54-
55-
# exec or check logs
56-
kubectl exec -ti ds/pcm -- bash
57-
kubectl logs ds/pcm
58-
```
5944

6045
### Requirements
6146

@@ -230,9 +215,12 @@ helm install pcm-metal . -f values-metal.yaml
230215

231216
#### Direct method as non-privileged container (not recommended)
232217

218+
**TODO**: TO BE MOVED TO EXTERNAL FILE/SECTION
219+
233220
**Note** PCM requires access to /dev/cpu device in read writer mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanila Kubernetes. Please read this isses for more information https://github.com/kubernetes/kubernetes/issues/5607.
234221

235222
##### a) Device injection using 3rd party device-plugin
223+
236224

237225
TO run PCM with as non privileged pod, we can third party devices plugins e.g.:
238226

@@ -259,9 +247,9 @@ kubectl get node kind-control-plane -o json | jq .status.capacity
259247
helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml
260248
```
261249

262-
##### b) Device injection using NRI plugin device-injection
250+
##### b) Device injection using NRI plugin device-injection
263251

264-
**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production.
252+
**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production (TO BE MOVED TO EXTERNAL FILE).
265253

266254
```
267255
git clone https://github.com/containerd/nri/
@@ -294,7 +282,7 @@ docker exec kind-control-plane systemctl status device-injector
294282
helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml
295283
```
296284

297-
#### Development (local images) and testing
285+
#### Development (with local images) and testing
298286

299287
1) Setup kind with registry following this instruction: https://kind.sigs.k8s.io/docs/user/local-registry/
300288
```
@@ -306,13 +294,33 @@ bash kind-with-registry.sh
306294
```
307295
docker build . -t localhost:5001/pcm-local
308296
docker push localhost:5001/pcm-local
297+
298+
# or with single line
299+
# Build local image for tests/development + fix /pcm/resctrl mounting (assuming project was configured with cmake previously):
300+
(cd ../.. ; (cd build ; make -j pcm pcm-sensor-server) ; docker build . -t localhost:5001/pcm-local && docker push localhost:5001/pcm-local; docker run -ti --rm --name pcmtest --entrypoint bash localhost:5001/pcm-local -c "pcm 2>&1 | head -5" )
309301
```
310302

311303
3) When deploying to kind cluster pcm use values to switch to local pcm-local image
312304
```
313305
helm install pcm . -f values-local-image.yaml
314306
```
315307

308+
4) Replace pcm-sensor-server with pcm or sleep
309+
```
310+
helm upgrade --install pcm . --set debugPcm=true
311+
helm upgrade --install pcm . --set debugSleep=true
312+
```
313+
314+
**TODO:** consiert debug options to be removed before release for security reasons
315+
316+
5) Check logs or intercat with container directly:
317+
```
318+
# exec into pcm container
319+
kubectl exec -ti ds/pcm -- bash
320+
# or check logs
321+
kubectl logs ds/pcm
322+
```
323+
316324
#### Troubleshooting
317325

318326
##### Metric availability and requirements (devices/mounts/permissions)
@@ -347,10 +355,4 @@ helm install pcm . -f values-local-image.yaml
347355
| | RO: /sys/firmware/acpi/tables/MCFG | PCM_USE_UNCORE_PERF | msr is disabled | pci.cpp:PciHandle::openMcfgTable() | mcfgMount |
348356
| | energy | | | cpucounters.cpp initEnergyMonitoring() | |
349357

350-
One can replace pcm-sensor-server command and run pcm or sleep to investigate issue add following arguments when install helm chart
351-
```
352-
--debugPcm=true # it will run pcm binary
353-
--debugSleep=true # will run "/usr/bin/sleep inf" to allow attach to container and debug interactivelly
354-
```
355358

356-
**TODO:** options to be removed before release

0 commit comments

Comments
 (0)