Skip to content

v0.9.3

Choose a tag to compare

@github-actions github-actions released this 02 Apr 11:28
· 212 commits to main since this release
3968d09

This is a new minor release of NRI Reference Plugins. It brings several new features, a number of bug fixes, end-to-end tests, and test coverage.

What's New

Balloons Policy

  • Cluster level visibility to CPU affinity. Configuration option agent.nodeResourceTopology: true enables observing balloons as zones in NodeResourceTopology custom resources. Furthermore, if showContainersInNrt: true is defined, information on each container, including CPU affinity, will be shown as a subzone of its balloon.

    Example configuration:

    showContainersInNrt: true
    agent:
      nodeResourceTopology: true

    Enables listing balloons and their cpusets on K8SNODE with

    kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="balloon") | {"balloon":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'

    and containers with their cpusets on the same node:

    kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="allocation for container") | {"container":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
  • System load balancing. Even if two containers run on disjoint sets of logical CPUs, they may nevertheless affect each others performance. This happens, for instance, if two memory-intensive containers share the same level 2 cache, or if they are compute-intenstive, use the same compute resources of a physical CPU core, and run on two hyperthreads of the same core.

    New system load balancing in the balloons policy is based on classifying loads generated containers using new loadClasses configuration option. Based on the load classes associated with balloonTypes using loads, the policy allocates CPUs to new and existing balloons so that it avoids overloading level 2 caches or physical CPU cores.

    Example: policy prefers selecting CPUs for all "inference engine" and "computational-fluid-dynamics" balloons within separate level 2 cache blocks to prevent cache trashing by any two of containers in these balloons.

    balloonTypes:
    - name: inference-engine
      loads:
      - memory-intensive
      ...
    - name: computational-fluid-dynamics
      loads:
      - memory-intensive
    ...
    loadClasses:
    - name: memory-intensive
      level: l2cache

Topology Aware Policy

  • Improved topology hint control: the topologyhints.resource-policy.nri.io annotation key can be used to enable or disable topology hint generation for one or more containers altogether, or selectively for mounts, devices, and pod resources types.
    For example:
metadata:
  annotations:
    # disable topology hint generation for all containers by default
    topologyhints.resource-policy.nri.io/pod: none
    # disable other than mount-based hints for the 'diskwriter' container
    topologyhints.resource-policy.nri.io/container.diskwriter: mounts
    # disable other than device-based hints for the 'videoencoder' container
    topologyhints.resource-policy.nri.io/container.videoencoder: devices
    # disable other than pod resource-based hints for the 'dpdk' container
    topologyhints.resource-policy.nri.io/container.dpdk: pod-resources
    # enable device and pod resource-based hints for 'networkpump' container
    topologyhints.resource-policy.nri.io/container.networkpump: devices,pod-resources

It is also possible to enable and disable topology hint generation based on mount or device path, using allow and deny lists. See the updated documentation for more details.

  • relaxed system topology restrictions: the policy should not refuse to start up if a NUMA node is shared by more than one pool at the same topology hierarchy level. In particular, a single NUMA node shared by all sockets should not prevent startup any more.

  • improved Burstable QoS class container handling: the policy now allocates memory to burstable QoS class containers based on memory request estimates. This should lower the probability for unexpected allocation failures when burstable containers are used to allocate a node close to full capacity.

  • better global shared allocation preference: a preferSharedCPUs: true global configuration option now applies to all containers, unless they are annotated to opt out using the prefer-shared-cpus.resource-policy.nri.io annotation.

Common Policy Improvements

  • Container cache and memory bandwidth allocation enables class-based management of system L2 and L3 cache and memory bandwidth. They are modeled as class-based uncountable and shareable resources. Containers can be assigned to predefined classes-of-service (CLOS), or RDT classes for short. Each class defines a specific configuration for cache and memory bandwidth allocation, which is applied to all containers within that class. The assigned container class is resolved and mapped to a CLOS in the runtime using goresctrl library. RDT control must be enabled in the runtime and the assigned classes must be defined in the runtime configuration. Otherwise the runtime might fail to create containers that are assigned to an RDT class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.

A container can be assigned either to an RDT class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the rdtclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:

apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy # or 'BalloonsPolicy' for the 'balloons' policy
metadata:
  name: default
spec:
  ...
  control:
    rdt:
      enable: true
      usePodQoSAsDefaultClass: true

RDT class assignment is also possible using annotations. For instance, to assign the packetpump container to the highprio and the scheduler container to the midprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:

metadata:
  annotations:
    rdtclass.resource-policy.nri.io/container.packetpump: highprio
    rdtclass.resource-policy.nri.io/container.scheduler: midprio
  • Container block I/O prioritization allows class-based control block I/O prioritization and throttling. Containers can be assigned to predefined block I/O classes. Each class defines a specific configuration of prioritization and throttling parameters which are applied to all containers assigned to the class. The assigned container class is resolved and mapped to actual parameters in the runtime using goresctrl library. Block I/O control must be enabled in the runtime and the classes must be defined in the runtime configuration. Otherwise the runtime fails to create containers that are assigned to a block I/O class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.

A container can be assigned either to a Block I/O class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the blockioclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:

apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy
metadata:
  name: default
spec:
  ...
  control:
    blockio:
      enable: true
      usePodQoSAsDefaultClass: true

Class assignment is also possible using annotations. For instance, to assign the database container to the highprio and
the logger container to the lowprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:

metadata:
  annotations:
    blockioclass.resource-policy.nri.io/container.database: highprio
    blockioclass.resource-policy.nri.io/container.logger: lowprio

What's Changed

  • balloons: do not require minFreq and maxFreq in CPU classes by @askervin in #455
  • balloons: expose balloons and optionally containers with affinity in NRT by @askervin in #469
  • balloons: introduce loadClasses for avoiding unwanted overloading in critical locations by @askervin in #493
  • topology-aware: exclude isolated CPUs from policy-picked reserved cpusets. by @klihub in #474
  • topology-aware: rework building the topology pool tree. by @klihub in #477
  • topology-aware: allocate burstable container memory by requests. by @klihub in #491
  • topology-aware: better semantics for globally configured shared CPU preference. by @klihub in #498
  • topology-aware: more consistent setup error handling. by @klihub in #502
  • memtierd: allow overriding go version for image build. by @klihub in #456
  • resmgr: improve annotated topology hint control. by @klihub in #499
  • resmgr: eliminate extra container state 'overlay'. by @klihub in #480
  • resmgr: eliminate extra RDT class 'overlay'. by @klihub in #481
  • resmgr: eliminate extra block I/O class 'overlay'. by @klihub in #482
  • resmgr: configurable RDT and block I/O class control. by @klihub in #483
  • system: add a helper for finding CPUs sharing caches by @askervin in #492
  • sysfs: only discover topology of online cpus by @marquiz in #494
  • pkg/udev: implement udev event reading/monitoring. by @klihub in #449
  • workflow: sign Helm packages and upload provenance files by @fmuyassarov in #468
  • [1/2] OLM workflow: add automatic OLM bundle submission. by @fmuyassarov in #460
  • [2/2] OLM workflow: allow both test and real submissions. by @klihub in #464
  • e2e: test topology-aware policy nodeResourceTopology exporting by @askervin in #465
  • e2e: add pure go stateful fuzz test generator by @askervin in #463

Full Changelog: v0.8.0...v0.9.3