Releases: containers/nri-plugins
v0.12.0
This new release of NRI Reference Plugins brings a few new features to resource policy plugins.
What's New
Balloons Policy
- Load Balancing for Composite Balloons
Thjs feature enables creating equally many balloons from two or more balloon types for the same set of containers. For example, if balloon types B0, ..., B3 are local to similar but separate hardware resources that accelerate HPC applications, this feature enables balancing the number of balloons created from each of these types for HPC containers. Containers are assigned to a composite balloon type BC, consisting of components B0, ..., B3, and componentCreation strategy of the BC balloon type is set tobalance-balloons.
Common Policy Improvements
- Resource Annotation
Exact resource requirements for container can be annotated on containers' pod. This can be useful in clusters with high CPU count nodes when one wants to allocate more than 256 CPUs to a single container, which is not properly detected otherwise. The newly added nri-resource-annotator webhook can be used to automate this process. You can install the webhook using the provided Helm chart with the following commands:
$ NS=kube-system; SVC=resource-annotator; CERT=~/webhook-cert
$ mkdir -p $CERT
$ openssl req -x509 -newkey rsa:2048 -sha256 -days 365 -nodes \
-keyout $CERT/server-key.pem -out $CERT/server-crt.pem \
-subj "/CN=$SVC.$NS.svc" -addext "subjectAltName=DNS:$SVC,DNS:$SVC.$NS,DNS:$SVC.$NS.svc"
$ helm repo add nri-plugins https://containers.github.io/nri-plugins
$ helm repo update
$ helm -n $NS install webhook nri-plugins/nri-resource-annotator \
--set image.name=ghcr.io/containers/nri-plugins/nri-resource-annotator \
--set service.base64Crt=$(base64 -w0 < $CERT/server-crt.pem) \
--set service.base64Key=$(base64 -w0 < $CERT/server-key.pem)The policies should automatically take resource annotations into account when they are present.
Other Changes
- OpenTelemetry metrics collection
Metrics collection has been updated to use OpenTelemetry for metrics instrumentation. In connection
with this change, the names of some available metrics have changed when collected using Prometheus.
In particular, metric names from the Balloons and Topology-Aware policies have changes.
What's Changed
- balloons: extend composite balloons with a kind of load balancing by @askervin in #617
- metrics: switch to OpenTelemetry based metrics collection. by @klihub in #600
- resource-annotator: add resource annotator mutating webhook. by @klihub in #619
- e2e: add global burstability limit test case. by @klihub in #607
- scripts: fix Helm release artifact checker. by @klihub in #609
- .github: fix unstable chart publishing with Helm v4.x. by @klihub in #610
- e2e: run_tests.sh accepts user overrides for vars read from files by @askervin in #613
- e2e: report skipped tests with SKIP instead of PASS in the summary by @askervin in #615
- e2e: bump default test distro to fedora/43. by @klihub in #616
- e2e: enable creating VMs up to 4096 CPUs and CPU hot-plugging/removing from test scripts by @askervin in #587
- e2e: add 8-socket 4k-CPU e2e vm and related test for balloons policy by @askervin in #591
- e2e: add 8-socket 4k-CPU test for topology-aware by @askervin in #599
- e2e: support "cxl" in hardware topology (CXL-1) by @askervin in #611
- e2e: enable custom kernel building, caching and installing (CXL-2) by @askervin in #612
- e2e: add a test that hotplugs and hotremoves CXL memory (CXL-3) by @askervin in #614
- build: ignore operator build failures for non releases by @klihub in #620
- e2e/balloons: update expected metrics pattern. by @klihub in #621
Full Changelog: v0.11.0...v0.12.0
v0.11.0
This new release of NRI Reference Plugins brings both new features and bug fixes to resource policy plugins.
What's New
Balloons Policy
- CPU C-state control allows selectively disabling some C-states of CPUs assigned to a balloon. A typical use case for this is to disable the deepest power saving C-states in balloons that host latency critical applications. This can improve hardware wakeup and consequently kernel scheduling latency for processes in the balloon.
config:
reservedResources:
cpu: 1
...
idleCPUClass: default-class
...
balloonTypes:
- name: low-latency
cpuClass: lowlatency-class
...
control:
cpu:
classes:
lowlatency-class:
disabledCstates: [C4, C6, C8, C10]
default-class:
disabledCstates: []
...Topology Aware Policy
-
Improved burstable CPU allocation allows the policy to prefer topology pools which can provide enough CPU capacity for a burstable QoS class container to reach its burstable limit, whenever there are enough free or idle shared capacity in the pool.
-
Configurable unlimited burstability puts a topological cap on the limit when picking CPUs for a burstable QoS class container without CPU limit, to prevent such containers from unconditionally getting assigned to the system pool with all CPUs and memory nodes. The supported caps are
system(all sockets),package(a single socket),die(CPUs in a single die), andnuma(CPUs close to a single NUMA node). The cap defaults topackage. Additionally, containers can be annotated with a pod- or container-specific burstability cap.
config:
reservedResources:
cpu: 1
...
# Constrain unlimited burstable containers close to a single NUMA node.
unlimitedBurstable: numaapiVersion: v1
kind: Pod
metadata:
name: burstable
annotations:
# constrain unlimited burstable containers to a single die by default
unlimited-burstable.resource-policy.nri.io/pod: die
# but loosen this for ctr0 to a full socket
unlimited-burstable.resource-policy.nri.io/container.ctr0: package
spec:
containers:
- name: ctr0
image: myimage-ctr0
imagePullPolicy: Always
resources:
requests:
cpu: 2
memory: 500M
limits:
memory: 500M
- name: ctr1
image: myimage-ctr1
imagePullPolicy: Always
resources:
requests:
cpu: 1
memory: 100M
limits:
memory: 100M
...- Exclusive allocation of all CPUs in a NUMA node, die and socket. The policy allows all CPUs in a shared pool to be allocated for exclusive use, as long as this does not cause any of the shared pools in active use to become completely devoid of CPU resources. This allows now tighter allocation and alignment of resources. The policy can now allocate a full NUMA node, die, or socket worth of CPUs exclusively to a single container from a single NUMA node, die or socket, while this was formerly refused forcing the container to at least one topology level above the tightest possible allocation.
What's Changed
- scripts/build/update-gh-pages: fix parsing of latest release tag by @marquiz in #556
- Bump golangci-lint and fix linter errors by @marquiz in #559
- e2e: fix VM disk sizing, provision rootfs with max possible size. by @klihub in #561
- build: bump k8s client and controller generator toolchains to latest. by @klihub in #562
- Bump golang to v1.25 by @marquiz in #558
- image-build: switch to bookworm for image building. by @klihub in #564
- build: speed up image building significantly. by @klihub in #565
- e2e: support custom image URLs, use fedora 42 by default. by @klihub in #568
- e2e: don't fail if partition resize fails. by @klihub in #566
- e2e: pre-download and cache custom vagrant images. by @klihub in #569
- e2e: use persistent connections for provisioning. by @klihub in #572
- e2e: adjust cloud image URLs, allow image override, fix ssh connectivity problems. by @klihub in #571
- docs: fix and improve reset (allow all) affinity by @askervin in #573
- memory-policy: fix crashes without configuration by @askervin in #574
- e2e: fix false failure in balloons test02-prometheus-metrics by @askervin in #575
- e2e: abort on vagrant bootstrap errors. by @klihub in #577
- e2e: always configure systemd cgroup driver for containerd. by @klihub in #576
- e2e: alternate test runtimes in tests. by @klihub in #578
- resmgr,helm: nuke obsolete metrics-interval command line flag. by @klihub in #582
- e2e: EFI BIOS support for e2e test VMs. by @klihub in #581
- e2e: cache and reuse release tarballs. by @klihub in #583
- Balloons: enabling disabling c-states in latency-critical balloons by @askervin in #579
- balloons: fix CpuLocations and its unit test by @askervin in #586
- topology-aware: consider burstable CPU limit when picking a pool by @klihub in #570
- policy: fix missing mem_node_capacity metrics data. by @klihub in #588
- e2e: fix helm install after helm v4.0 cli flag rename by @askervin in #589
- pkg/cgroups: remove obsoleted/unused cgroupblkio. by @klihub in #592
- config/blockio: fix typo, clarify block I/O setup. by @klihub in #593
- pkg/cgroupstats: nuke unused v1 cgroup collector. by @klihub in #594
- e2e: fix helm-launch when error is expected by @askervin in #595
- balloons: allow allocation of online CPUs only by @askervin in #590
- e2e: speed up test execution by @askervin in #596
- topology-aware: avoid slicing busy shared pools empty by @klihub in #601
- topology-aware: allow slicing idle shared pools empty. by @klihub in #602
- topology-aware: don't (ac)count 0 CPU req. containers in the reserved pool to shared pools by @klihub in #604
- e2e: add gentler and rougher vm-reboot methods by @askervin in #606
Full Changelog: v0.10.1...v0.11.0
v0.10.1
This is a new minor release of NRI Reference Plugins. It updates dependencies, enables RDT 'discovery mode', brings a few documentation improvements, and fixes a startup failure on machines with an asymmetric NUMA distance matrix.
What's Changed
- build(deps): bump golang.org/x/oauth2 from 0.21.0 to 0.27.0 by @dependabot[bot] in #548
- docs: update documentation for RDT monitoring/metrics. by @klihub in #549
- rdt,resmgr: allow running RDT in discovery mode. by @klihub in #551
- docs: update balloons debugging guidance and add examples by @askervin in #552
- go.mod,Makefile: bump golang to latest 1.24.x. by @klihub in #555
- sysfs: patch up asymmetric NUMA distances. by @klihub in #554
Full Changelog: v0.10.0...v0.10.1
v0.10.0
This new release of NRI Reference Plugins brings a new NRI plugin, new features in resource policy plugins, a number of bug fixes, end-to-end tests and few use cases in documentation.
What's New
Balloons Policy
-
Composite balloons enables allocating a diverse set of CPUs for containers with complex CPU requirements. For example, "allocate an equal number of CPUs from both NUMA nodes on CPU socket 0". This allocation enables efficient parallelism inside an AI inference engine container that runs inference on CPU, and still isolate inference engines from each other.
balloonTypes: - name: balance-pkg0-nodes components: - balloonType: node0 - balloonType: node1 - name: node0 preferCloseToDevices: - /sys/devices/system/node/node0 - name: node1 preferCloseToDevices: - /sys/devices/system/node/node1
-
Documentation includes recipes for preventing creation of certain containers on a worker node, and resetting CPU and memory pinning of all containers in a cluster.
Topology Aware Policy
-
Pick CPU and Memory by Topology Hints Normally topology hints are only used to pick the assigned pool for a workload. Once a pool is selected the available resources within the pool are considered equally good for satisfying the topology hints. When the policy is allocating exclusive CPUs and picking pinned memory for the workload, only other potential criteria and attributes are considered for picking the individual resources.
When multiple devices are allocated to a single container, it is possible that this default assumption of all resources within the pool being topologically equal is not true. If a container is allocated misaligned devices, IOW devices with different memory or CPU locality. To overcome this, containers can now be annotated to prefer hint based selection and pinning of CPU and memory resources using the
pick-resources-by-hints.resource-policy.nri.ioannotation. For example,apiVersion: v1 kind: Pod metadata: name: data-pump annotations: k8s.v1.cni.cncf.io/networks: sriov-net1 prefer-isolated-cpus.resource-policy.nri.io/container.ctr0: "true" pick-resources-by-hints.resource-policy.nri.io/container.ctr0: "true" spec: containers: - name: ctr0 image: dpdk-pump imagePullPolicy: Always resources: requests: cpu: 2 memory: 100M vendor.com/sriov_netdevice_A: '1' vendor.com/sriov_netdevice_B: '1' limits: vendor.com/sriov_netdevice_A: '1' vendor.com/sriov_netdevice_B: '1' cpu: 2 memory: 100M
When annotated like that, the policy will try to pick one exclusive isolated CPU with locality to one device and another with locality to the other. It will also try to pick and pin to memory aligned with these devices.
Common Policy Improvements
These are improvements to common infrastructure and as such are available for the balloons and topology-aware policy plugins, as well as for the wireframe template policy plugin.
-
Cache Allocation
Plugins can be configured to exercise class-based control over the L2 and L3 cache allocated to containers' processes. In practice, containers are assigned to classes. Classes have a corresponding cache allocation configuration. This configuration is applied to all containers and subsequently to all processes started in a container. To enable cache control use the
control.rdt.enableoption which defaults tofalse.Plugins can be configured to assign containers by default to a cache class named after the Pod QoS class of the container: one of
BestEffort,Burstable, andGuaranteed. The configuration setting controlling this behavior iscontrol.rdt.usagePodQoSAsDefaultClassand it defaults tofalse.Additionally, containers can be explicitly annotated to be assigned to a class. Use the
rdtclass.resource-policy.nri.ioannotation key for this. For instanceapiVersion: v1 kind: Pod metadata: name: test-pod annotations: rdtclass.resource-policy.nri.io/pod: poddefaultclass rdtclass.resource-policy.nri.io/container.special-container: specialclass ...
This will assign the container named
special-containerwithin the pod to thespecialclassRDT class and any other container within the pod to thepoddefaultclassRDT class. Effectively these containers' processes will be assigned to the RDT CLOSes corresponding to those classes.Cache Class/Partitioning Configuration
RDT configuration is supplied as part of the
control.rdtconfiguration block. Here is a sample snippet as a Helm chart value which assigns 33%, 66% and 100% of cache lines toBestEffort,BurstableandGuaranteedPod QoS class containers correspondingly:config: control: rdt: enable: true usePodQoSAsDefaultClass: true options: l2: optional: true l3: optional: true mb: optional: true partitions: fullCache: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100% classes: BestEffort: l2Allocation: all: unified: 33% l3Allocation: all: unified: 33% Burstable: l2Allocation: all: unified: 66% l3Allocation: all: unified: 66% Guaranteed: l2Allocation: all: unified: 100% l3Allocation: all: unified: 100%
Cache Allocation Prerequisites
Note that for cache allocation control to work, you must have
- a hardware platform which supports cache allocation
- resctrlfs pseudofilesystem enabled in your kernel, and loaded if it is a module
- the resctrlfs filesystem mounted (possibly with extra options for your platform)
New plugin: nri-memory-policy
- The NRI memory policy plugin sets Linux memory policy for new containers.
- The memory policy plugin, for instance, advises kernel to interleave memory pages of a container on all NUMA nodes in the system, or on all NUMA nodes near the same socket where container's allowed CPUs are located.
- The plugin works as a stand-alone plugin, and it works together with NRI resource policy plugins and Kubernetes resource managers. It recognizes CPU and memory pinning set by resource management components. The memory policy plugin should be after the resource policy plugins in the NRI plugins chain.
- Memory policy for a container is defined in pod annotations.
- At the time of NRI plugins release, latest released containerd or CRI-O do not support NRI Linux memory policy adjustments, or NRI container command line adjustments for a workaround. Using this plugin requires a container runtime that is built with NRI version including command line adjustments. (NRI version > 0.9.0)
What's Changed
- resmgr,config: allow configuring cache allocation via goresctrl. by @klihub in #541
- resmgr: expose RDT metrics. by @klihub in #543
- Balloons with components by @askervin in #526
- topology-aware: try picking resources by hints first by @klihub in #545
- memory-policy: NRI plugin for setting memory policy by @askervin in #517
- mempolicy: go interface for set_mempolicy and get_mempolicy syscalls by @askervin in #514
- mpolset: get/set memory policy and exec a command by @askervin in #515
- topology-aware: fix format of container-exported memsets. by @klihub in #532
- resmgr: update container-exported resource data. by @klihub in #537
- sysfs: add a helper for gathering whatever IDs related to CPUs by @askervin in #513
- sysfs: fix CPU.GetCaches() to not return empty slice. by @klihub in #533
- sysfs: export CPUFreq.{Min,Max}. by @klihub in #534
- helm: add Chart for memory-policy deployment by @askervin in #519
- go.{mod,sum}: use new goresctrl tag v0.9.0. by @klihub in #544
- Drop tools.go in favor of native tool directive support in go 1.24 by @fmuyassarov in #535
- golang: bump go version to 1.24[.3]. by @klihub in #528
Full Changelog: v0.9.4...v0.10.0
v0.9.4
This is a new minor release of NRI Reference Plugins. It fixes incorrect caching of Pod Resource API query results which in some cases could result in incorrect generated topology hints.
What's Changed
- resmgr: purge cached pod resource list upon pod stop/removal. by @klihub in #507
- github: explicitly ensure contents-only copying by @fmuyassarov in #508
Full Changelog: v0.9.3...v0.9.4
v0.9.3
This is a new minor release of NRI Reference Plugins. It brings several new features, a number of bug fixes, end-to-end tests, and test coverage.
What's New
Balloons Policy
-
Cluster level visibility to CPU affinity. Configuration option
agent.nodeResourceTopology: trueenables observing balloons as zones in NodeResourceTopology custom resources. Furthermore, ifshowContainersInNrt: trueis defined, information on each container, including CPU affinity, will be shown as a subzone of its balloon.Example configuration:
showContainersInNrt: true agent: nodeResourceTopology: true
Enables listing balloons and their cpusets on K8SNODE with
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="balloon") | {"balloon":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
and containers with their cpusets on the same node:
kubectl get noderesourcetopology K8SNODE -o json | jq '.zones[] | select(.type=="allocation for container") | {"container":.name, "cpuset":(.attributes[]|select(.name=="cpuset").value)}'
-
System load balancing. Even if two containers run on disjoint sets of logical CPUs, they may nevertheless affect each others performance. This happens, for instance, if two memory-intensive containers share the same level 2 cache, or if they are compute-intenstive, use the same compute resources of a physical CPU core, and run on two hyperthreads of the same core.
New system load balancing in the balloons policy is based on classifying loads generated containers using new
loadClassesconfiguration option. Based on the load classes associated withballoonTypesusingloads, the policy allocates CPUs to new and existing balloons so that it avoids overloading level 2 caches or physical CPU cores.Example: policy prefers selecting CPUs for all "inference engine" and "computational-fluid-dynamics" balloons within separate level 2 cache blocks to prevent cache trashing by any two of containers in these balloons.
balloonTypes: - name: inference-engine loads: - memory-intensive ... - name: computational-fluid-dynamics loads: - memory-intensive ... loadClasses: - name: memory-intensive level: l2cache
Topology Aware Policy
- Improved topology hint control: the
topologyhints.resource-policy.nri.ioannotation key can be used to enable or disable topology hint generation for one or more containers altogether, or selectively for mounts, devices, and pod resources types.
For example:
metadata:
annotations:
# disable topology hint generation for all containers by default
topologyhints.resource-policy.nri.io/pod: none
# disable other than mount-based hints for the 'diskwriter' container
topologyhints.resource-policy.nri.io/container.diskwriter: mounts
# disable other than device-based hints for the 'videoencoder' container
topologyhints.resource-policy.nri.io/container.videoencoder: devices
# disable other than pod resource-based hints for the 'dpdk' container
topologyhints.resource-policy.nri.io/container.dpdk: pod-resources
# enable device and pod resource-based hints for 'networkpump' container
topologyhints.resource-policy.nri.io/container.networkpump: devices,pod-resourcesIt is also possible to enable and disable topology hint generation based on mount or device path, using allow and deny lists. See the updated documentation for more details.
-
relaxed system topology restrictions: the policy should not refuse to start up if a NUMA node is shared by more than one pool at the same topology hierarchy level. In particular, a single NUMA node shared by all sockets should not prevent startup any more.
-
improved Burstable QoS class container handling: the policy now allocates memory to burstable QoS class containers based on memory request estimates. This should lower the probability for unexpected allocation failures when burstable containers are used to allocate a node close to full capacity.
-
better global shared allocation preference: a
preferSharedCPUs: trueglobal configuration option now applies to all containers, unless they are annotated to opt out using theprefer-shared-cpus.resource-policy.nri.ioannotation.
Common Policy Improvements
- Container cache and memory bandwidth allocation enables class-based management of system L2 and L3 cache and memory bandwidth. They are modeled as class-based uncountable and shareable resources. Containers can be assigned to predefined classes-of-service (CLOS), or RDT classes for short. Each class defines a specific configuration for cache and memory bandwidth allocation, which is applied to all containers within that class. The assigned container class is resolved and mapped to a CLOS in the runtime using goresctrl library. RDT control must be enabled in the runtime and the assigned classes must be defined in the runtime configuration. Otherwise the runtime might fail to create containers that are assigned to an RDT class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.
A container can be assigned either to an RDT class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the rdtclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:
apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy # or 'BalloonsPolicy' for the 'balloons' policy
metadata:
name: default
spec:
...
control:
rdt:
enable: true
usePodQoSAsDefaultClass: trueRDT class assignment is also possible using annotations. For instance, to assign the packetpump container to the highprio and the scheduler container to the midprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:
metadata:
annotations:
rdtclass.resource-policy.nri.io/container.packetpump: highprio
rdtclass.resource-policy.nri.io/container.scheduler: midprio- Container block I/O prioritization allows class-based control block I/O prioritization and throttling. Containers can be assigned to predefined block I/O classes. Each class defines a specific configuration of prioritization and throttling parameters which are applied to all containers assigned to the class. The assigned container class is resolved and mapped to actual parameters in the runtime using goresctrl library. Block I/O control must be enabled in the runtime and the classes must be defined in the runtime configuration. Otherwise the runtime fails to create containers that are assigned to a block I/O class. Refer to the containerd, cri-o, and goresctrl documentation for more details about configuration.
A container can be assigned either to a Block I/O class matching its pod's QoS class (BestEffort, Burstable or Guaranteed), or it can be assigned to an arbitrary class using the blockioclass.resource-policy.nri.io annotation. To enable QoS-class based default assignment you can use a configuration fragment similar to this:
apiVersion: config.nri/v1alpha1
kind: TopologyAwarePolicy
metadata:
name: default
spec:
...
control:
blockio:
enable: true
usePodQoSAsDefaultClass: trueClass assignment is also possible using annotations. For instance, to assign the database container to the highprio and
the logger container to the lowprio classes. Any other potential container in the pod will be assigned to the class matching their pd's QoS class:
metadata:
annotations:
blockioclass.resource-policy.nri.io/container.database: highprio
blockioclass.resource-policy.nri.io/container.logger: lowprioWhat's Changed
- balloons: do not require minFreq and maxFreq in CPU classes by @askervin in #455
- balloons: expose balloons and optionally containers with affinity in NRT by @askervin in #469
- balloons: introduce loadClasses for avoiding unwanted overloading in critical locations by @askervin in #493
- topology-aware: exclude isolated CPUs from policy-picked reserved cpusets. by @klihub in #474
- topology-aware: rework building the topology pool tree. by @klihub in #477
- topology-aware: allocate burstable container memory by requests. by @klihub in #491
- topology-aware: better semantics for globally configured shared CPU preference. by @klihub in #498
- topology-aware: more consistent setup error handling. by @klihub in #502
- memtierd: allow overriding go version for image build. by @klihub in #456
- resmgr: improve annotated topology hint control. by @klihub in #499
- resmgr: eliminate extra container state 'overlay'. by @klihub in https://github.com/containers/nr...
v0.8.0
This is a new major release of NRI Reference Plugins. It brings several new features, a number of bug fixes, improvements to the build system, to CI, end-to-end tests, and test coverage.
What's New
Balloons Policy
-
New
preservepolicy option enables matching containers whose CPU
and memory affinity must not be modified by the resource policy.This enables allowing selected containers to access all CPUs and
memories. For example, allow pcm-sensor-server
to access MSRs on every CPU for low-level metrics:preserve: matchExpressions: - key: pod/labels/app.kubernetes.io/name operator: In values: - pcm-sensor-serverEarlier this required
cpu.preserve.resource-policy.nri.ioand
memory.preserve.resource-policy.nri.iopod annotations. -
New
freqGovernorCPU class option enables setting CPU frequency
governor based on the CPU class of a balloon. Example:balloonTypes: - name: powersaving cpuClass: mypowersave control: cpu: classes: mypowersave: freqGovernor: powersave -
New
memoryTypesballoon type option specifies required memory
types when setting memory affinity. For example, containers in
high-memory-bandwidth balloons will use only HBM when configured as:balloonTypes: - name: high-memory-bandwidth memoryTypes: - HBM -
Support
memory-type.resource-policy.nri.iopod annotation for
setting memory affinity into closest HBM, DRAM, PMEM, or any
combination. This annotation is a pod level override to the
memoryTypesballoon type option. -
L2-cache group aware CPU allocation and sharing. For example,
containers in a balloon can be allowed to burst on idle
(unallocated) CPUs that share the same L2 cache as CPUs allocated to
the balloon.balloonTypes: - name: l2burst shareIdleCPUsInSame: l2cache -
Override to
pinMemorypolicy option in balloon type level. Enables
setting memory affinity of containers only in certain balloons while
others are not set, and vice versa. Example:pinMemory: false balloonTypes: - name: latency-sensitive pinMemory: true preferIsolCpus: true preferNewBalloons: true -
New default configuration runs Guaranteed containers on dedicated
CPUs while BestEffort and Burstable containers are allowed to share
remaining CPUs on the same socket, but not cross socket boundaries. -
Balance BestEffort containers between balloons with equal amount of
available resources. -
Smaller risk for OOMs on
pinMemory: true, as memory affinity was
refactored to use smart libmem.
Topology Aware Policy
The Topology Aware policy can now export prometheus metrics per topology zone. Exported metrics include pool CPU set and memory set, shared CPU subpool total capacity, allocations and available capacity, memory total capacity, allocations and available amount, number of assigned containers and containers in the shared subpool.
To enable exporting these metrics, make sure that you are running with the latest policy configuration custom resource definition and you have policy included in the spec/instrumentation/metrics/enabled slice, like this:
...
spec:
...
instrumentation:
...
metrics:
enabled:
- policy
...
The Topology Aware policy can now use data from the kubelet's Pod Resource API to generate extra topology hints for resource allocation and alignment. These hints are disabled in the default configuration installed by Helm charts. To enable them, make sure that you are running with the latest policy configuration custom resource definition and you have spec/agent/podResourceAPI set to true in the configuration, like this:
spec:
agent:
...
podResourceAPI: true
...
- Support
memory-type.resource-policy.nri.iopod annotation for
setting memory affinity into closest HBM, DRAM or PMEM, or any
combination.
What's Changed
Balloons Policy Fixes and Improvements
- balloons: add "preserve" option to match containers whose pinning must not be modified by @askervin in #368
- balloons: add support for cpu frequency governor tuning by @fmuyassarov in #374
- balloons: set frequency scaling governor only when requested by @fmuyassarov in #379
- balloons: improve handling of containers with no CPU requests by @askervin in #386
- balloons: add debug logging to selecting a balloon type by @askervin in #396
- balloons: support for L2 cache cluster allocation by @askervin in #384
- balloons: add memoryTypes to balloon types by @askervin in #395
- Add balloon type specific pinMemory option by @askervin in #451
Topology Aware Policy Fixes and Improvements
- metrics: add topology-aware policy metrics collection. by @klihub in #406
- topology-aware: correctly reconfigure implicit affinities for configuration changes. by @klihub in #394
- fixes: copy assigned memory zone in grant clone. by @klihub in #413
New Policy Agnostic Metrics, Common De Facto Exporters
- metrics: cleanup metrics registration, collection and gathering. by @klihub in #403
- metrics: add de-facto standard collectors. by @klihub in #404
- metrics: simplify policy/backend metrics collection interface. by @klihub in #408
- metrics: add policy system collector. by @klihub in #405
Topology Hints Based on Pod Resource API
- podresapi: agent,config,helm: make agent runtime configurable. by @klihub in #418
- podresapi: resmgr,agent: generate topology hints from Pod Resource API. by @klihub in #419
- podresapi: topology-aware: use Pod Resource API hints if present. by @klihub in #420
- agent,resmgr: merge PodResources{List,Map}, cache last List() result. by @klihub in #423
Common Resource Management Fixes and Improvements
- resmgr: fix "qosclass" in policy expressions by @askervin in #387
- resmgr,agent: propagate startup config error back to CR. by @klihub in #416
- libmem: implement policy-agnostic memory allocation/accounting. by @klihub in #332
- libmem: typo and thinko fixes. by @klihub in #381
- sysfs: enable faking CPU cache configurations using OVERRIDE_SYS_CACHES by @askervin in #383
- cpuallocator, plugins: handle priority as an option. by @klihub in #414
- Fix typos in expression code doc and matchExpression yamls by @askervin in #370
Helm Chart and Configuration Fixes and Improvements
- helm: enable prometheus autodiscovery by @klihub in #393
- helm: new balloons default configuration by @askervin in #391
- apis/config: use consistent assignment in +kubebuilder:validation tags. by @klihub in #397
- sample-configs: fix a copy-pasted comment thinko. by @klihub in #402
End-to-end Testing Fixes and Improvements
- e2e: pull and save runtime logs after each test. by @klihub in #367
- e2e: adjust metrics test for updated PrettyName(). by @klihub in #366
- e2e: switch default test distro to fedora/40-cloud-base. by @klihub in #375
- e2e: fix provisioning for Ubuntu cloud image. by @klihub in #377
- e2e: enable vagrant debugging. by @klihub in #376
- e2e: adjust $VM_HOSTNAME for policy node config usage. by @klihub in #378
- e2e: skip long running tests by default. by @klihub in #373
- e2e: fix command filenames in test output directories by @askervin in #390
- e2e: containerd 2.0.0. provisioning fixup. by @klihub in #400
- e2e/balloons: remove unknown/unused helm-launch argument. by @klihub in #407
Build Environment Fixes and Improvements
v0.7.1
This release of NRI Reference Plugins brings new features, a few bug fixes, and updates to the documentation.
Highlights
- balloons policy now supports assigning kernel-isolated CPU cores to balloons when available. To prefer isolated CPU cores for a balloon, use the new
preferIsolCpusboolean configuration option. For instance,
balloonTypes:
- name: high-prio-physical-core
minCPUs: 2
maxCPUs: 2
preferNewBalloons: true
preferIsolCpus: true
hideHyperthreads: true
...
- balloons policy now supports assigning performance optimized or energy efficient CPU cores to balloons when available. For instance, to define a balloon with energy efficient core preference and another one with performance core preference use the new
preferCoreTypeconfiguration option like this:
balloonTypes:
- name: low-prio
namespaces:
- logging
- monitoring
preferCoreType: efficient
...
- name: high-prio
preferCoreType: performance
...
- Topology-aware policy now allocates CPU cores in clusters of shared last-level cache. Whenever this provides different grouping than the rest of the topology, for instance hyperthreads, the CPU allocator now divides cores into groups defined by shared last-level cache. The topology-aware policy tries to allocate as few LLC groups to a container as possible and tries to avoid sharing an LLC group by multiple containers.
What's New
- balloons: add support for isolated cpus. by @fmuyassarov in #344
- balloons: add support for power efficient & high performance cores by @fmuyassarov in #354
- cpuallocator: implement clustered allocation based on cache groups. by @klihub in #343
What Changed
Resource assignment policies should now try harder to detect when a new container is a restarted instance of an existing container which has just exited or crashed. This should fix problems where a crashing container could not be restarted on an nearly fully allocated node.
- deps: bump NRT dependencies to v0.1.2. by @fmuyassarov in #348
- topology-aware: add missing SingleThreadForCPUs() to mockSysfs. by @klihub in #349
- balloons: add support for isolated cpus. by @fmuyassarov in #344
- cpuallocator: implement clustered allocation based on cache groups. by @klihub in #343
- fixes: fix host-wait-vm-ssh-server, improve vm-reboot. by @klihub in #350
- fix: clean up plugin at the beginning/end of tests. by @klihub in #351
- doc: add availableResources in the balloons policy documentation by @askervin in #355
- build: allow building a single plugin image. by @klihub in #357
- balloons: add support for power efficient & high performance cores by @fmuyassarov in #354
- e2e: fix cni_plugin=bridge in provisioning a vm by @askervin in #359
- e2e: bridge CNI setup fixes for Fedora/containerd. by @klihub in #361
- e2e: use bridge CNI plugin by default. by @klihub in #362
- CI: verify in smaller steps, verify binary builds. by @klihub in #364
- resmgr: lifecycle overlap detection and workaround. by @klihub in #358
Full Changelog: v0.7.0...v0.7.1
v0.7.0
This release of NRI Reference Plugins brings in new features and important bug fixes.
Highlights
- Topology-aware and balloons resource policies now support soft-disabling of hyperthreads per container. This improves the performance of some classes of workloads. Both policies support new pod annotation:
and the balloons policy has new balloon-type option
hide-hyperthreads.resource-policy.nri.io/container.<CONTAINER-NAME>: "true"hideHyperthreadsthat soft-disables hyperthreads on all containers assigned to a balloon of this type. - The topology-aware policy supports pinning containers to high-bandwidth memory (HBM), or both HBM and DRAM, when pods are annotated with
memory-type.resource-policy.nri.io/container.<CONTAINER-NAME>: hbm memory-type.resource-policy.nri.io/container.<CONTAINER-NAME>: hbm,dram - Automatic hardware topology hint generation has been fixed in the topology-aware policy. For instance, if a container uses a PCI device, the policy prefers pinning the container to CPUs and memory that are close to the device.
What's New
- balloons: hideHyperthreads balloon type option and annotation by @askervin in #338
- topology-aware: add support for hide-hyperthreads annotation. by @askervin in #331
What Changed
- topology-aware: don't ignore HBM memory nodes without close CPUs. by @klihub in #329
- topology-aware: relax NUMA node topology checks. by @klihub in #336
- resmgr: exit when ttrpc connection goes down. by @klihub in #319
- cpuallocator: don't filter based on single CoreKind. by @klihub in #345
- sysfs,cpuallocator: fix CPU cluster discovery. by @klihub in #337
- sysfs: survive NUMA nodes without memory. by @klihub in #339
- sysfs: allow non-uniform thread count. by @klihub in #340
- helm: flip podPriorityClassNodeCritical to true. by @klihub in #312
- config-manager: allow configuring NRI timeouts. by @klihub in #318
New Contributors
Full Changelog: v0.5.0...v0.7.0
v0.5.1
This release of the NRI Reference Plugins brings a few improvements to hardware topology detection and resource assignment.
What's New
- cpuallocator: topology discovery fixes and improvements. by @klihub in #206
- cpuallocator: add support for hybrid core discovery, preferred allocation. by @klihub in #295
- topology-aware: configurable allocation priority by @klihub in #282
- resmgr: enable opentelemetry tracing (span propagation) over the NRI ttrpc connection. by @klihub in #293
Updates, Fixes, and Other Improvements
- sysfs: dump system discovery results in a more predictable order. by @klihub in #294
- github: package and publish interim unstable Helm charts from the main and release branches by @marquiz, @klihub in #303
Full Changelog: v0.4.1...v0.5.1