You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deployment/pcm/README.md
+14-19Lines changed: 14 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,27 +5,22 @@ Helm chart instructions
5
5
### Features:
6
6
7
7
- privilege and non-privileged container (value: `privileged`),
8
-
- node-feature-discovery based nodeSelector and nodeAffinity (values: nfd, nfdBaremetalAffinity, nfdRDTAffinity)
9
8
- bare-metal and VM host configurations (files: values-metal.yaml, values-vm.yaml),
10
9
- Ability to deploy multiple releases alongside configured differently to handle different kinds of machines (bare-metal, VM) at the same time,
11
-
- Examples for non-privileged mode using device plugin ("smarter-devices-manager") or using NRI device-injector plugin (TODO) (file: values-smarter-devices-cpu-mem.yaml),
- Integration with NRI balloons policy plugin (value: `nriBalloonsPolicyIntegration`),
14
10
- Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: values-indirect.yaml)
15
11
- Linux Watchdog handling (controlled with PCM_KEEP_NMI_WATCHDOG, PCM_NO_AWS_WORKAROUND, nmiWatchdogMount values)
16
12
- Deploy to own namespace with "helm install ... **-n pcm --create-namespace**"
17
-
- Local image registry for development (file: values-local-image.yaml),
18
13
19
-
TODO/Ideas:
20
-
-[ ] Refactor extra features: node-feature-discovery, NRI interegration only as extra values for generic fields (annotations, nodeSelector/nodeAffinity)
21
-
-[ ] Check if energy metrics can be accessible through perf subsystem
22
-
-[ ] GitHub actions for linter/security scanners,
23
-
-[ ] Idea: Change metrics names (follow Prometheus best practices)
24
-
-[ ] Idea: init container to check permission for all required components (devices/CPU)
25
-
-[ ] Implement Helm chart test pods + NOTES
26
-
-[ ] Test liveness/readiness probes
27
-
-[ ] Testing in Cluster Manager Systems like (e.g. Ranger,Gardener) different node types VM(1socket,all sockets), bare-metal
28
-
-[ ] Test in different cloud GCP/Azure/AWS
14
+
#### Integration features:
15
+
16
+
-node-feature-discovery based nodeSelector and nodeAffinity (values: nfd, nfdBaremetalAffinity, nfdRDTAffinity),
17
+
-Examples for non-privileged mode using device plugin ("smarter-devices-manager") or using NRI device-injector plugin (TODO) (file: values-smarter-devices-cpu-mem.yaml),
18
+
-Integration with NRI balloons policy plugin (value: `nriBalloonsPolicyIntegration`),
19
+
20
+
#### Debugging features:
21
+
22
+
-Local image registry for development (file: values-local-image.yaml),
- Full set of metrics requires metal instance (uncore metrics, RDT, energy, UPI),
61
+
- Full set of metrics requires bare-metal or .metal instance (uncore metrics, RDT, energy, UPI),
67
62
- Core metrics (instructions, cycles are also available) on VM instances,
68
-
-In both case "msr" kernel module has to be loaded in host OS,
69
-
- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace,
70
-
- Pod Security Standards allow to run on privileged level,
63
+
-/sys/fs/resctrl has to be mounted on host OS,
64
+
- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace in other words: Pod Security Standards allow to run on privileged level,
0 commit comments