Skip to content

Commit 9568e1c

Browse files
authored
Merge pull request #55855 from rohennes/TELCODOCS-649-numa-ga
TELCODOCS-649: NUMA GA readiness, removing dev previews and docs update
2 parents 2d65f63 + 456067c commit 9568e1c

14 files changed

+608
-85
lines changed

modules/cnf-about-numa-aware-scheduling.adoc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Non-Uniform Memory Access (NUMA) is a compute platform architecture that allows
1010

1111
NUMA architecture allows a CPU with multiple memory controllers to use any available memory across CPU complexes, regardless of where the memory is located. This allows for increased flexibility at the expense of performance. A CPU processing a workload using memory that is outside its NUMA zone is slower than a workload processed in a single NUMA zone. Also, for I/O-constrained workloads, the network interface on a distant NUMA zone slows down how quickly information can reach the application. High-performance workloads, such as telecommunications workloads, cannot operate to specification under these conditions. NUMA-aware scheduling aligns the requested cluster compute resources (CPUs, memory, devices) in the same NUMA zone to process latency-sensitive or high-performance workloads efficiently. NUMA-aware scheduling also improves pod density per compute node for greater resource efficiency.
1212

13+
By integrating the Node Tuning Operator's performance profile with NUMA-aware scheduling, you can further configure CPU affinity to optimize performance for latency-sensitive workloads.
14+
1315
The default {product-title} pod scheduler scheduling logic considers the available resources of the entire compute node, not individual NUMA zones. If the most restrictive resource alignment is requested in the kubelet topology manager, error conditions can occur when admitting the pod to a node. Conversely, if the most restrictive resource alignment is not requested, the pod can be admitted to the node without proper resource alignment, leading to worse or unpredictable performance. For example, runaway pod creation with `Topology Affinity Error` statuses can occur when the pod scheduler makes suboptimal scheduling decisions for guaranteed pod workloads by not knowing if the pod's requested resources are available. Scheduling mismatch decisions can cause indefinite pod startup delays. Also, depending on the cluster state and resource allocation, poor pod scheduling decisions can cause extra load on the cluster because of failed startup attempts.
1416

1517
The NUMA Resources Operator deploys a custom NUMA resources secondary scheduler and other resources to mitigate against the shortcomings of the default {product-title} pod scheduler. The following diagram provides a high-level overview of NUMA-aware pod scheduling.
@@ -21,3 +23,10 @@ NodeResourceTopology API:: The `NodeResourceTopology` API describes the availabl
2123
NUMA-aware scheduler:: The NUMA-aware secondary scheduler receives information about the available NUMA zones from the `NodeResourceTopology` API and schedules high-performance workloads on a node where it can be optimally processed.
2224
Node topology exporter:: The node topology exporter exposes the available NUMA zone resources for each compute node to the `NodeResourceTopology` API. The node topology exporter daemon tracks the resource allocation from the kubelet by using the `PodResources` API.
2325
PodResources API:: The `PodResources` API is local to each node and exposes the resource topology and available resources to the kubelet.
26+
+
27+
[NOTE]
28+
====
29+
The `List` endpoint of the `PodResources` API exposes exclusive CPUs allocated to a particular container. The API does not expose CPUs that belong to a shared pool.
30+
31+
The `GetAllocatableResources` endpoint exposes allocatable resources available on a node.
32+
====

modules/cnf-checking-numa-aware-scheduler-logs.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" delete
5353
+
5454
[source,yaml,subs="attributes+"]
5555
----
56-
apiVersion: nodetopology.openshift.io/v1alpha1
56+
apiVersion: nodetopology.openshift.io/v1
5757
kind: NUMAResourcesScheduler
5858
metadata:
5959
name: numaresourcesscheduler
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_content-type: PROCEDURE
6+
7+
[id="cnf-configuring-node-groups-for-the-numaresourcesoperator_{context}"]
8+
= Optional: Configuring polling operations for NUMA resources updates
9+
10+
The daemons controlled by the NUMA Resources Operator in their `nodeGroup` poll resources to retrieve updates about available NUMA resources. You can fine-tune polling operations for these daemons by configuring the `spec.nodeGroups` specification in the `NUMAResourcesOperator` custom resource (CR). This provides advanced control of polling operations. Configure these specifications to improve scheduling behaviour and troubleshoot suboptimal scheduling decisions.
11+
12+
The configuration options are the following:
13+
14+
* `infoRefreshMode`: Determines the trigger condition for polling the kublet. The NUMA Resources Operator reports the resulting information to the API server.
15+
* `infoRefreshPeriod`: Determines the duration between polling updates.
16+
* `podsFingerprinting`: Determines if point-in-time information for the current set of pods running on a node is exposed in polling updates.
17+
+
18+
[NOTE]
19+
====
20+
`podsFingerprinting` is enabled by default. `podsFingerprinting` is a requirement for the `cacheResyncPeriod` specification in the `NUMAResourcesScheduler` CR. The `cacheResyncPeriod` specification helps to report more exact resource availability by monitoring pending resources on nodes.
21+
====
22+
23+
.Prerequisites
24+
25+
* Install the OpenShift CLI (`oc`).
26+
* Log in as a user with `cluster-admin` privileges.
27+
* Install the NUMA Resources Operator.
28+
29+
.Procedure
30+
31+
* Configure the `spec.nodeGroups` specification in your `NUMAResourcesOperator` CR:
32+
+
33+
[source,yaml]
34+
----
35+
apiVersion: nodetopology.openshift.io/v1
36+
kind: NUMAResourcesOperator
37+
metadata:
38+
name: numaresourcesoperator
39+
spec:
40+
nodeGroups:
41+
- config:
42+
infoRefreshMode: Periodic <1>
43+
infoRefreshPeriod: 10s <2>
44+
podsFingerprinting: Enabled <3>
45+
name: worker
46+
----
47+
<1> Valid values are `Periodic`, `Events`, `PeriodicAndEvents`. Use `Periodic` to poll the kublet at intervals that you define in `infoRefreshPeriod`. Use `Events` to poll the kublet at every pod lifecycle event. Use `PeriodicAndEvents` to enable both methods.
48+
<2> Define the polling interval for `Periodic` or `PeriodicAndEvents` refresh modes. The field is ignored if the refresh mode is `Events`.
49+
<3> Valid values are `Enabled` or `Disabled`. Setting to `Enabled` is a requirement for the `cacheResyncPeriod` specification in the `NUMAResourcesScheduler`.
50+
51+
.Verification
52+
53+
. After you deploy the NUMA Resources Operator, verify that the node group configurations were applied by running the following command:
54+
+
55+
[source,terminal]
56+
----
57+
$ oc get numaresop numaresourcesoperator -o json | jq '.status'
58+
----
59+
+
60+
.Example output
61+
[source,terminal]
62+
----
63+
...
64+
65+
"config": {
66+
"infoRefreshMode": "Periodic",
67+
"infoRefreshPeriod": "10s",
68+
"podsFingerprinting": "Enabled"
69+
},
70+
"name": "worker"
71+
72+
...
73+
----
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_module-type: PROCEDURE
6+
[id="cnf-creating-nrop-cr-with-manual-performance-settings_{context}"]
7+
= Creating the NUMAResourcesOperator custom resource with manual performance settings
8+
9+
When you have installed the NUMA Resources Operator, then create the `NUMAResourcesOperator` custom resource (CR) that instructs the NUMA Resources Operator to install all the cluster infrastructure needed to support the NUMA-aware scheduler, including daemon sets and APIs.
10+
11+
.Prerequisites
12+
13+
* Install the OpenShift CLI (`oc`).
14+
* Log in as a user with `cluster-admin` privileges.
15+
* Install the NUMA Resources Operator.
16+
17+
.Procedure
18+
19+
. Optional: Create the `MachineConfigPool` custom resource that enables custom kubelet configurations for worker nodes:
20+
+
21+
[NOTE]
22+
====
23+
By default, {product-title} creates a `MachineConfigPool` resource for worker nodes in the cluster. You can create a custom `MachineConfigPool` resource if required.
24+
====
25+
26+
.. Save the following YAML in the `nro-machineconfig.yaml` file:
27+
+
28+
[source,yaml]
29+
----
30+
apiVersion: machineconfiguration.openshift.io/v1
31+
kind: MachineConfigPool
32+
metadata:
33+
labels:
34+
cnf-worker-tuning: enabled
35+
machineconfiguration.openshift.io/mco-built-in: ""
36+
pools.operator.machineconfiguration.openshift.io/worker: ""
37+
name: worker
38+
spec:
39+
machineConfigSelector:
40+
matchLabels:
41+
machineconfiguration.openshift.io/role: worker
42+
nodeSelector:
43+
matchLabels:
44+
node-role.kubernetes.io/worker: ""
45+
----
46+
47+
.. Create the `MachineConfigPool` CR by running the following command:
48+
+
49+
[source,terminal]
50+
----
51+
$ oc create -f nro-machineconfig.yaml
52+
----
53+
54+
. Create the `NUMAResourcesOperator` custom resource:
55+
56+
.. Save the following YAML in the `nrop.yaml` file:
57+
+
58+
[source,yaml]
59+
----
60+
apiVersion: nodetopology.openshift.io/v1
61+
kind: NUMAResourcesOperator
62+
metadata:
63+
name: numaresourcesoperator
64+
spec:
65+
nodeGroups:
66+
- machineConfigPoolSelector:
67+
matchLabels:
68+
pools.operator.machineconfiguration.openshift.io/worker: "" <1>
69+
----
70+
<1> Should match the label applied to worker nodes in the related `MachineConfigPool` CR.
71+
72+
.. Create the `NUMAResourcesOperator` CR by running the following command:
73+
+
74+
[source,terminal]
75+
----
76+
$ oc create -f nrop.yaml
77+
----
78+
79+
.Verification
80+
81+
* Verify that the NUMA Resources Operator deployed successfully by running the following command:
82+
+
83+
[source,terminal]
84+
----
85+
$ oc get numaresourcesoperators.nodetopology.openshift.io
86+
----
87+
+
88+
.Example output
89+
[source,terminal]
90+
----
91+
NAME AGE
92+
numaresourcesoperator 10m
93+
----

modules/cnf-creating-nrop-cr.adoc

Lines changed: 5 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -16,53 +16,22 @@ When you have installed the NUMA Resources Operator, then create the `NUMAResour
1616
1717
.Procedure
1818

19-
. Create the `MachineConfigPool` custom resource that enables custom kubelet configurations for worker nodes:
20-
21-
.. Save the following YAML in the `nro-machineconfig.yaml` file:
22-
+
23-
[source,yaml]
24-
----
25-
apiVersion: machineconfiguration.openshift.io/v1
26-
kind: MachineConfigPool
27-
metadata:
28-
labels:
29-
cnf-worker-tuning: enabled
30-
machineconfiguration.openshift.io/mco-built-in: ""
31-
pools.operator.machineconfiguration.openshift.io/worker: ""
32-
name: worker
33-
spec:
34-
machineConfigSelector:
35-
matchLabels:
36-
machineconfiguration.openshift.io/role: worker
37-
nodeSelector:
38-
matchLabels:
39-
node-role.kubernetes.io/worker: ""
40-
----
41-
42-
.. Create the `MachineConfigPool` CR by running the following command:
43-
+
44-
[source,terminal]
45-
----
46-
$ oc create -f nro-machineconfig.yaml
47-
----
48-
4919
. Create the `NUMAResourcesOperator` custom resource:
5020

5121
.. Save the following YAML in the `nrop.yaml` file:
5222
+
5323
[source,yaml]
5424
----
55-
apiVersion: nodetopology.openshift.io/v1alpha1
25+
apiVersion: nodetopology.openshift.io/v1
5626
kind: NUMAResourcesOperator
5727
metadata:
5828
name: numaresourcesoperator
5929
spec:
6030
nodeGroups:
6131
- machineConfigPoolSelector:
6232
matchLabels:
63-
pools.operator.machineconfiguration.openshift.io/worker: "" <1>
33+
pools.operator.machineconfiguration.openshift.io/worker: ""
6434
----
65-
<1> Should match the label applied to worker nodes in the related `MachineConfigPool` CR.
6635

6736
.. Create the `NUMAResourcesOperator` CR by running the following command:
6837
+
@@ -73,13 +42,13 @@ $ oc create -f nrop.yaml
7342

7443
.Verification
7544

76-
Verify that the NUMA Resources Operator deployed successfully by running the following command:
77-
45+
* Verify that the NUMA Resources Operator deployed successfully by running the following command:
46+
+
7847
[source,terminal]
7948
----
8049
$ oc get numaresourcesoperators.nodetopology.openshift.io
8150
----
82-
51+
+
8352
.Example output
8453
[source,terminal]
8554
----
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
// Module included in the following assemblies:
2+
//
3+
// *scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_module-type: PROCEDURE
6+
[id="cnf-deploying-the-numa-aware-scheduler-with-manual-performance-settings_{context}"]
7+
= Deploying the NUMA-aware secondary pod scheduler with manual performance settings
8+
9+
After you install the NUMA Resources Operator, do the following to deploy the NUMA-aware secondary pod scheduler:
10+
11+
* Configure the pod admittance policy for the required machine profile
12+
13+
* Create the required machine config pool
14+
15+
* Deploy the NUMA-aware secondary scheduler
16+
17+
.Prerequisites
18+
19+
* Install the OpenShift CLI (`oc`).
20+
21+
* Log in as a user with `cluster-admin` privileges.
22+
23+
* Install the NUMA Resources Operator.
24+
25+
.Procedure
26+
. Create the `KubeletConfig` custom resource that configures the pod admittance policy for the machine profile:
27+
28+
.. Save the following YAML in the `nro-kubeletconfig.yaml` file:
29+
+
30+
[source,yaml]
31+
----
32+
apiVersion: machineconfiguration.openshift.io/v1
33+
kind: KubeletConfig
34+
metadata:
35+
name: cnf-worker-tuning
36+
spec:
37+
machineConfigPoolSelector:
38+
matchLabels:
39+
cnf-worker-tuning: enabled
40+
kubeletConfig:
41+
cpuManagerPolicy: "static" <1>
42+
cpuManagerReconcilePeriod: "5s"
43+
reservedSystemCPUs: "0,1"
44+
memoryManagerPolicy: "Static" <2>
45+
evictionHard:
46+
memory.available: "100Mi"
47+
kubeReserved:
48+
memory: "512Mi"
49+
reservedMemory:
50+
- numaNode: 0
51+
limits:
52+
memory: "1124Mi"
53+
systemReserved:
54+
memory: "512Mi"
55+
topologyManagerPolicy: "single-numa-node" <3>
56+
topologyManagerScope: "pod"
57+
----
58+
<1> For `cpuManagerPolicy`, `static` must use a lowercase `s`.
59+
<2> For `memoryManagerPolicy`, `Static` must use an uppercase `S`.
60+
<3> `topologyManagerPolicy` must be set to `single-numa-node`.
61+
62+
.. Create the `KubeletConfig` custom resource (CR) by running the following command:
63+
+
64+
[source,terminal]
65+
----
66+
$ oc create -f nro-kubeletconfig.yaml
67+
----
68+
69+
. Create the `NUMAResourcesScheduler` custom resource that deploys the NUMA-aware custom pod scheduler:
70+
71+
.. Save the following YAML in the `nro-scheduler.yaml` file:
72+
+
73+
[source,yaml,subs="attributes+"]
74+
----
75+
apiVersion: nodetopology.openshift.io/v1
76+
kind: NUMAResourcesScheduler
77+
metadata:
78+
name: numaresourcesscheduler
79+
spec:
80+
imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-container-rhel8:v{product-version}"
81+
cacheResyncPeriod: "5s" <1>
82+
----
83+
<1> Enter an interval value in seconds for synchronization of the scheduler cache. A value of `5s` is typical for most implementations.
84+
+
85+
[NOTE]
86+
====
87+
* Enable the `cacheResyncPeriod` specification to help the NUMA Resource Operator report more exact resource availability by monitoring pending resources on nodes and synchronizing this information in the scheduler cache at a defined interval. This also helps to minimize `Topology Affinity Error` errors because of sub-optimal scheduling decisions. The lower the interval the greater the network load. The `cacheResyncPeriod` specification is disabled by default.
88+
89+
* Setting a value of `Enabled` for the `podsFingerprinting` specification in the `NUMAResourcesOperator` CR is a requirement for the implementation of the `cacheResyncPeriod` specification.
90+
====
91+
92+
.. Create the `NUMAResourcesScheduler` CR by running the following command:
93+
+
94+
[source,terminal]
95+
----
96+
$ oc create -f nro-scheduler.yaml
97+
----
98+
99+
.Verification
100+
101+
* Verify that the required resources deployed successfully by running the following command:
102+
+
103+
[source,terminal]
104+
----
105+
$ oc get all -n openshift-numaresources
106+
----
107+
+
108+
.Example output
109+
[source,terminal]
110+
----
111+
NAME READY STATUS RESTARTS AGE
112+
pod/numaresources-controller-manager-7575848485-bns4s 1/1 Running 0 13m
113+
pod/numaresourcesoperator-worker-dvj4n 2/2 Running 0 16m
114+
pod/numaresourcesoperator-worker-lcg4t 2/2 Running 0 16m
115+
pod/secondary-scheduler-56994cf6cf-7qf4q 1/1 Running 0 16m
116+
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
117+
daemonset.apps/numaresourcesoperator-worker 2 2 2 2 2 node-role.kubernetes.io/worker= 16m
118+
NAME READY UP-TO-DATE AVAILABLE AGE
119+
deployment.apps/numaresources-controller-manager 1/1 1 1 13m
120+
deployment.apps/secondary-scheduler 1/1 1 1 16m
121+
NAME DESIRED CURRENT READY AGE
122+
replicaset.apps/numaresources-controller-manager-7575848485 1 1 1 13m
123+
replicaset.apps/secondary-scheduler-56994cf6cf 1 1 1 16m
124+
----

0 commit comments

Comments
 (0)