v0.9.3
This is a new minor release of NRI Reference Plugins. It brings several new features, a number of bug fixes, end-to-end tests, and test coverage.
What's New
Balloons Policy
-
Cluster level visibility to CPU affinity. Configuration option
agent.nodeResourceTopology: trueenables observing balloons as zones in NodeResourceTopology custom resources. Furthermore, ifshowContainersInNrt: trueis defined, information on each container, including CPU affinity, will be shown as a subzone of its balloon.Example configuration:
showContainersInNrt: true agent: nodeResourceTopology: true
Enables listing balloons and their cpusets on K8SNODE with
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="balloon") | {"balloon":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
and containers with their cpusets on the same node:
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="allocation for container") | {"container":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
-
System load balancing. Even if two containers run on disjoint sets of logical CPUs, they may nevertheless affect each others performance. This happens, for instance, if two memory-intensive containers share the same level 2 cache, or if they are compute-intenstive, use the same compute resources of a physical CPU core, and run on two hyperthreads of the same core.
New system load balancing in the balloons policy is based on classifying loads generated containers using new
loadClassesconfiguration option. Based on the load classes associated withballoonTypesusingloads, the policy allocates CPUs to new and existing balloons so that it avoids overloading level 2 caches or physical CPU cores.Example: policy prefers selecting CPUs for all "inference engine" and "computational-fluid-dynamics" balloons within separate level 2 cache blocks to prevent cache trashing by any two of containers in these balloons.
balloonTypes: - name: inference-engine loads: - memory-intensive ... - name: computational-fluid-dynamics loads: - memory-intensive ... loadClasses: - name: memory-intensive level: l2cache
Topology Aware Policy
- Improved topology hint control: the
topologyhints.resource-policy.nri.ioannotation key can be used to enable or disable topology hint generation for one or more containers altogether, or selectively for mounts, devices, and pod resources types.
For example:
metadata:
annotations:
# disable topology hint generation for all containers by default
topologyhints.resource-policy.nri.io/pod: none
# disable other than mount-based hints for the 'diskwriter' container
topologyhints.resource-policy.nri.io/container.diskwriter: mounts
# disable other than device-based hints for the 'videoencoder' container
topologyhints.resource-policy.nri.io/container.videoencoder: devices
# disable other than pod resource-based hints for the 'dpdk' container
topologyhints.resource-policy.nri.io/container.dpdk: pod-resources
# enable device and pod resource-based hints for 'networkpump' container
topologyhints.resource-policy.nri.io/container.networkpump: devices,pod-resourcesIt is also possible to enable and disable topology hint generation based on mount or device path, using allow and deny lists. See the updated documentation for more details.
-
relaxed system topology restrictions: the policy should not refuse to start up if a NUMA node is shared by more than one pool at the same topology hierarchy level. In particular, a single NUMA node shared by all sockets should not prevent startup any more.
-
improved Burstable QoS class container handling: the policy now allocates memory to burstable QoS class containers based on memory request estimates. This should lower the probability for unexpected allocation failures when burstable containers are used to allocate a node close to full capacity.
-
better global shared allocation preference: a
preferSharedCPUs: trueglobal configuration option now applies to all containers, unless they are annotated to opt out using theprefer-shared-cpus.resource-policy.nri.ioannotation.
Common Policy Improvements
- Container cache and memory bandwidth allocation enables class-based management of system L2 and L3 cache and memory bandwidth. They are modeled as class-based uncountable and shareable resources. Containers can be assigned to predefined classes-of-service (CLOS), or RDT classes for short. Each class defines a specific configuration for cache and memory bandwidth allocation, which is applied to all containers within that class. The assigned container class is resolved and mapped to a CLOS in the runtime using goresctrl library. RDT control must be enabled in the runtime and the assigned classes must be defined in the runtime configuration. Otherwise the runtime might fail to create containers that are assigned to an RDT class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.
A container can be assigned either to an RDT class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the rdtclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:
apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy # or 'BalloonsPolicy' for the 'balloons' policy
metadata:
name: default
spec:
...
control:
rdt:
enable: true
usePodQoSAsDefaultClass: trueRDT class assignment is also possible using annotations. For instance, to assign the packetpump container to the highprio and the scheduler container to the midprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:
metadata:
annotations:
rdtclass.resource-policy.nri.io/container.packetpump: highprio
rdtclass.resource-policy.nri.io/container.scheduler: midprio- Container block I/O prioritization allows class-based control block I/O prioritization and throttling. Containers can be assigned to predefined block I/O classes. Each class defines a specific configuration of prioritization and throttling parameters which are applied to all containers assigned to the class. The assigned container class is resolved and mapped to actual parameters in the runtime using goresctrl library. Block I/O control must be enabled in the runtime and the classes must be defined in the runtime configuration. Otherwise the runtime fails to create containers that are assigned to a block I/O class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.
A container can be assigned either to a Block I/O class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the blockioclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:
apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy
metadata:
name: default
spec:
...
control:
blockio:
enable: true
usePodQoSAsDefaultClass: trueClass assignment is also possible using annotations. For instance, to assign the database container to the highprio and
the logger container to the lowprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:
metadata:
annotations:
blockioclass.resource-policy.nri.io/container.database: highprio
blockioclass.resource-policy.nri.io/container.logger: lowprioWhat's Changed
- balloons: do not require minFreq and maxFreq in CPU classes by @askervin in #455
- balloons: expose balloons and optionally containers with affinity in NRT by @askervin in #469
- balloons: introduce loadClasses for avoiding unwanted overloading in critical locations by @askervin in #493
- topology-aware: exclude isolated CPUs from policy-picked reserved cpusets. by @klihub in #474
- topology-aware: rework building the topology pool tree. by @klihub in #477
- topology-aware: allocate burstable container memory by requests. by @klihub in #491
- topology-aware: better semantics for globally configured shared CPU preference. by @klihub in #498
- topology-aware: more consistent setup error handling. by @klihub in #502
- memtierd: allow overriding go version for image build. by @klihub in #456
- resmgr: improve annotated topology hint control. by @klihub in #499
- resmgr: eliminate extra container state 'overlay'. by @klihub in #480
- resmgr: eliminate extra RDT class 'overlay'. by @klihub in #481
- resmgr: eliminate extra block I/O class 'overlay'. by @klihub in #482
- resmgr: configurable RDT and block I/O class control. by @klihub in #483
- system: add a helper for finding CPUs sharing caches by @askervin in #492
- sysfs: only discover topology of online cpus by @marquiz in #494
- pkg/udev: implement udev event reading/monitoring. by @klihub in #449
- workflow: sign Helm packages and upload provenance files by @fmuyassarov in #468
- [1/2] OLM workflow: add automatic OLM bundle submission. by @fmuyassarov in #460
- [2/2] OLM workflow: allow both test and real submissions. by @klihub in #464
- e2e: test topology-aware policy nodeResourceTopology exporting by @askervin in #465
- e2e: add pure go stateful fuzz test generator by @askervin in #463
Full Changelog: v0.8.0...v0.9.3