Skip to content

Commit 93791cb

Browse files
authored
Merge pull request #87551 from openshift-cherrypick-robot/cherry-pick-86820-to-enterprise-4.16
[enterprise-4.16] OCPBUGS-44632 Adding scoring strategy
2 parents f6aa6e6 + 1330626 commit 93791cb

File tree

3 files changed

+202
-1
lines changed

3 files changed

+202
-1
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="cnf-numa-resource-scheduling-strategies_{context}"]
7+
= NUMA resource scheduling strategies
8+
9+
When scheduling high-performance workloads, the secondary scheduler can employ different strategies to determine which NUMA node within a chosen worker node will handle the workload. The supported strategies in {product-title} include `LeastAllocated`, `MostAllocated`, and `BalancedAllocation`. Understanding these strategies helps optimize workload placement for performance and resource utilization.
10+
11+
When a high-performance workload is scheduled in a NUMA-aware cluster, the following steps occur:
12+
13+
. The scheduler first selects a suitable worker node based on cluster-wide criteria. For example taints, labels, or resource availability.
14+
15+
. After a worker node is selected, the scheduler evaluates its NUMA nodes and applies a scoring strategy to decide which NUMA node will handle the workload.
16+
17+
. After a workload is scheduled, the selected NUMA node’s resources are updated to reflect the allocation.
18+
19+
The default strategy applied is the `LeastAllocated` strategy. This assigns workloads to the NUMA node with the most available resources that is the least utilized NUMA node. The goal of this strategy is to spread workloads across NUMA nodes to reduce contention and avoid hotspots.
20+
21+
The following table summarizes the different strategies and their outcomes:
22+
23+
[discrete]
24+
[id="cnf-scoringstrategy-summary_{context}"]
25+
== Scoring strategy summary
26+
27+
.Scoring strategy summary
28+
[cols="2,3,3", options="header"]
29+
|===
30+
|Strategy |Description |Outcome
31+
|`LeastAllocated` |Favors NUMA nodes with the most available resources. |Spreads workloads to reduce contention and ensure headroom for high-priority tasks.
32+
|`MostAllocated` |Favors NUMA nodes with the least available resources. |Consolidates workloads on fewer NUMA nodes, freeing others for energy efficiency.
33+
|`BalancedAllocation` |Favors NUMA nodes with balanced CPU and memory usage. |Ensures even resource utilization, preventing skewed usage patterns.
34+
|===
35+
36+
[discrete]
37+
[id="cnf-leastallocated-example_{context}"]
38+
== LeastAllocated strategy example
39+
The `LeastAllocated` is the default strategy. This strategy assigns workloads to the NUMA node with the most available resources, minimizing resource contention and spreading workloads across NUMA nodes. This reduces hotspots and ensures sufficient headroom for high-priority tasks. Assume a worker node has two NUMA nodes, and the workload requires 4 vCPUs and 8 GB of memory:
40+
41+
.Example initial NUMA nodes state
42+
[cols="5,2,2,2,2,2", options="header"]
43+
|===
44+
|NUMA node |Total CPUs |Used CPUs |Total memory (GB) |Used memory (GB) |Available resources
45+
|NUMA 1 |16 |12 |64 |56 |4 CPUs, 8 GB memory
46+
|NUMA 2 |16 |6 |64 |24 |10 CPUs, 40 GB memory
47+
|===
48+
49+
Because NUMA 2 has more available resources compared to NUMA 1, the workload is assigned to NUMA 2.
50+
51+
[discrete]
52+
[id="cnf-mostallocated-example_{context}"]
53+
== MostAllocated strategy example
54+
The `MostAllocated` strategy consolidates workloads by assigning them to the NUMA node with the least available resources, which is the most utilized NUMA node. This approach helps free other NUMA nodes for energy efficiency or critical workloads requiring full isolation. This example uses the "Example initial NUMA nodes state" values listed in the `LeastAllocated` section.
55+
56+
The workload again requires 4 vCPUs and 8 GB memory. NUMA 1 has fewer available resources compared to NUMA 2, so the scheduler assigns the workload to NUMA 1, further utilizing its resources while leaving NUMA 2 idle or minimally loaded.
57+
58+
[discrete]
59+
[id="cnf-balanceallocated-example_{context}"]
60+
== BalancedAllocation strategy example
61+
The `BalancedAllocation` strategy assigns workloads to the NUMA node with the most balanced resource utilization across CPU and memory. The goal is to prevent imbalanced usage, such as high CPU utilization with underutilized memory. Assume a worker node has the following NUMA node states:
62+
63+
.Example NUMA nodes initial state for `BalancedAllocation`
64+
[cols="2,2,2,2",options="header"]
65+
|===
66+
|NUMA node |CPU usage |Memory usage |`BalancedAllocation` score
67+
|NUMA 1 |60% |55% |High (more balanced)
68+
|NUMA 2 |80% |20% |Low (less balanced)
69+
|===
70+
71+
NUMA 1 has a more balanced CPU and memory utilization compared to NUMA 2 and therefore, with the `BalancedAllocation` strategy in place, the workload is assigned to NUMA 1.
72+
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * scalability_and_performance/cnf-numa-aware-scheduling.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="cnf-changing-where-high-performance-workloads-run_{context}"]
7+
= Changing where high-performance workloads run
8+
9+
The NUMA-aware secondary scheduler is responsible for scheduling high-performance workloads on a worker node and within a NUMA node where the workloads can be optimally processed. By default, the secondary scheduler assigns workloads to the NUMA node within the chosen worker node that has the most available resources.
10+
11+
If you want to change where the workloads run, you can add the `scoringStrategy` setting to the `NUMAResourcesScheduler` custom resource and set its value to either `MostAllocated` or `BalancedAllocation`.
12+
13+
.Prerequisites
14+
15+
* Install the OpenShift CLI (`oc`).
16+
* Log in as a user with `cluster-admin` privileges.
17+
18+
.Procedure
19+
20+
. Delete the currently running `NUMAResourcesScheduler` resource by using the following steps:
21+
22+
.. Get the active `NUMAResourcesScheduler` by running the following command:
23+
+
24+
[source,terminal]
25+
----
26+
$ oc get NUMAResourcesScheduler
27+
----
28+
+
29+
.Example output
30+
[source,terminal]
31+
----
32+
NAME AGE
33+
numaresourcesscheduler 92m
34+
----
35+
36+
.. Delete the secondary scheduler resource by running the following command:
37+
+
38+
[source,terminal]
39+
----
40+
$ oc delete NUMAResourcesScheduler numaresourcesscheduler
41+
----
42+
+
43+
.Example output
44+
[source,terminal]
45+
----
46+
numaresourcesscheduler.nodetopology.openshift.io "numaresourcesscheduler" deleted
47+
----
48+
49+
. Save the following YAML in the file `nro-scheduler-mostallocated.yaml`. This example changes the `scoringStrategy` to `MostAllocated`:
50+
+
51+
[source,yaml]
52+
----
53+
apiVersion: nodetopology.openshift.io/v1
54+
kind: NUMAResourcesScheduler
55+
metadata:
56+
name: numaresourcesscheduler
57+
spec:
58+
imageSpec: "registry.redhat.io/openshift4/noderesourcetopology-scheduler-container-rhel8:v{product-version}"
59+
scoringStrategy:
60+
type: "MostAllocated" <1>
61+
----
62+
<1> If the `scoringStrategy` configuration is omitted, the default of `LeastAllocated` applies.
63+
64+
. Create the updated `NUMAResourcesScheduler` resource by running the following command:
65+
+
66+
[source,terminal]
67+
----
68+
$ oc create -f nro-scheduler-mostallocated.yaml
69+
----
70+
+
71+
.Example output
72+
[source,terminal]
73+
----
74+
numaresourcesscheduler.nodetopology.openshift.io/numaresourcesscheduler created
75+
----
76+
77+
.Verification
78+
79+
. Check that the NUMA-aware scheduler was successfully deployed by using the following steps:
80+
81+
.. Run the following command to check that the custom resource definition (CRD) is created successfully:
82+
+
83+
[source,terminal]
84+
----
85+
$ oc get crd | grep numaresourcesschedulers
86+
----
87+
+
88+
.Example output
89+
[source,terminal]
90+
----
91+
NAME CREATED AT
92+
numaresourcesschedulers.nodetopology.openshift.io 2022-02-25T11:57:03Z
93+
----
94+
95+
.. Check that the new custom scheduler is available by running the following command:
96+
+
97+
[source,terminal]
98+
----
99+
$ oc get numaresourcesschedulers.nodetopology.openshift.io
100+
----
101+
+
102+
.Example output
103+
[source,terminal]
104+
----
105+
NAME AGE
106+
numaresourcesscheduler 3h26m
107+
----
108+
109+
. Verify that the `ScoringStrategy` has been applied correctly by running the following command to check the relevant `ConfigMap` resource for the scheduler:
110+
+
111+
[source,terminal]
112+
----
113+
$ oc get -n openshift-numaresources cm topo-aware-scheduler-config -o yaml | grep scoring -A 1
114+
----
115+
+
116+
.Example output
117+
[source,terminal]
118+
----
119+
scoringStrategy:
120+
type: MostAllocated
121+
----

scalability_and_performance/cnf-numa-aware-scheduling.adoc

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,14 @@ The NUMA Resources Operator allows you to schedule high-performance workloads in
1414

1515
include::modules/cnf-about-numa-aware-scheduling.adoc[leveloffset=+1]
1616

17+
include::modules/cnf-numa-resource-scheduling-strategies.adoc[leveloffset=+1]
18+
19+
[role="_additional-resources"]
1720
.Additional resources
1821

19-
* For more information about running secondary pod schedulers in your cluster and how to deploy pods with a secondary pod scheduler, see xref:../nodes/scheduling/secondary_scheduler/nodes-secondary-scheduler-configuring.adoc#secondary-scheduler-configuring[Scheduling pods using a secondary scheduler].
22+
* xref:../nodes/scheduling/secondary_scheduler/nodes-secondary-scheduler-configuring.adoc#secondary-scheduler-configuring[Scheduling pods using a secondary scheduler]
23+
24+
* xref:../scalability_and_performance/cnf-numa-aware-scheduling.adoc#cnf-changing-where-high-performance-workloads-run_numa-aware[Changing where high-performance workloads run]
2025
2126
[id="installing-the-numa-resources-operator_{context}"]
2227
== Installing the NUMA Resources Operator
@@ -39,6 +44,7 @@ include::modules/cnf-deploying-the-numa-aware-scheduler.adoc[leveloffset=+2]
3944

4045
include::modules/cnf-configuring-single-numa-policy.adoc[leveloffset=+2]
4146

47+
[role="_additional-resources"]
4248
.Additional resources
4349

4450
* xref:../scalability_and_performance/low_latency_tuning/cnf-tuning-low-latency-nodes-with-perf-profile.adoc#cnf-about-the-profile-creator-tool_cnf-low-latency-perf-profile[About the Performance Profile Creator]
@@ -55,6 +61,8 @@ include::modules/cnf-troubleshooting-numa-aware-workloads.adoc[leveloffset=+1]
5561

5662
include::modules/cnf-reporting-more-exact-reource-availability.adoc[leveloffset=+2]
5763

64+
include::modules/cnf-scheduling-exact-based-on-reource.adoc[leveloffset=+2]
65+
5866
include::modules/cnf-checking-numa-aware-scheduler-logs.adoc[leveloffset=+2]
5967

6068
include::modules/cnf-troubleshooting-resource-topo-exporter.adoc[leveloffset=+2]

0 commit comments

Comments
 (0)