Skip to content

Commit 16380d3

Browse files
committed
TELCODOCS-1238: Adding ability to exclude SR-IOV from Topo Manager for more flexible scheduling
1 parent b24ae36 commit 16380d3

6 files changed

+243
-1
lines changed
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/configuring-sriov-device.adoc
4+
5+
:_content-type: PROCEDURE
6+
[id="nw-sriov-configure-exclude-topology-manager_{context}"]
7+
= Excluding the SR-IOV network topology for NUMA-aware scheduling
8+
9+
To exclude advertising the SR-IOV network resource's Non-Uniform Memory Access (NUMA) node to the Topology Manager, you can configure the `excludeTopology` specification in the `SriovNetworkNodePolicy` custom resource. Use this configuration for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.
10+
11+
.Prerequisites
12+
13+
* You have installed the OpenShift CLI (`oc`).
14+
* You have configured the CPU Manager policy to `static`. For more information about CPU Manager, see the _Additional resources_ section.
15+
* You have configured the Topology Manager policy to `single-numa-node`.
16+
* You have installed the SR-IOV Network Operator.
17+
18+
.Procedure
19+
20+
. Create the `SriovNetworkNodePolicy` CR:
21+
22+
.. Save the following YAML in the `sriov-network-node-policy.yaml` file, replacing values in the YAML to match your environment:
23+
+
24+
[source,yaml]
25+
----
26+
apiVersion: sriovnetwork.openshift.io/v1
27+
kind: SriovNetworkNodePolicy
28+
metadata:
29+
name: <policy_name>
30+
namespace: openshift-sriov-network-operator
31+
spec:
32+
resourceName: sriovnuma0 <1>
33+
nodeSelector:
34+
kubernetes.io/hostname: <node_name>
35+
numVfs: <number_of_Vfs>
36+
nicSelector: <2>
37+
vendor: "<vendor_ID>"
38+
deviceID: "<device_ID>"
39+
deviceType: netdevice
40+
excludeTopology: true <3>
41+
----
42+
<1> The resource name of the SR-IOV network device plugin. This YAML uses a sample `resourceName` value.
43+
<2> Identify the device for the Operator to configure by using the NIC selector.
44+
<3> To exclude advertising the NUMA node for the SR-IOV network resource to the Topology Manager, set the value to `true`. The default value is `false`.
45+
+
46+
[NOTE]
47+
====
48+
If multiple `SriovNetworkNodePolicy` resources target the same SR-IOV network resource, the `SriovNetworkNodePolicy` resources must have the same value as the `excludeTopology` specification. Otherwise, the conflicting policy is rejected.
49+
====
50+
51+
.. Create the `SriovNetworkNodePolicy` resource by running the following command:
52+
+
53+
[source,terminal]
54+
----
55+
$ oc create -f sriov-network-node-policy.yaml
56+
----
57+
+
58+
.Example output
59+
[source,terminal]
60+
----
61+
sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-for-numa-0 created
62+
----
63+
64+
. Create the `SriovNetwork` CR:
65+
66+
.. Save the following YAML in the `sriov-network.yaml` file, replacing values in the YAML to match your environment:
67+
+
68+
[source,yaml]
69+
----
70+
apiVersion: sriovnetwork.openshift.io/v1
71+
kind: SriovNetwork
72+
metadata:
73+
name: sriov-numa-0-network <1>
74+
namespace: openshift-sriov-network-operator
75+
spec:
76+
resourceName: sriovnuma0 <2>
77+
networkNamespace: <namespace> <3>
78+
ipam: |- <4>
79+
{
80+
"type": "<ipam_type>",
81+
}
82+
----
83+
<1> Replace `sriov-numa-0-network` with the name for the SR-IOV network resource.
84+
<2> Specify the resource name for the `SriovNetworkNodePolicy` CR from the previous step. This YAML uses a sample `resourceName` value.
85+
<3> Enter the namespace for your SR-IOV network resource.
86+
<4> Enter the IP address management configuration for the SR-IOV network.
87+
88+
.. Create the `SriovNetwork` resource by running the following command:
89+
+
90+
[source,terminal]
91+
----
92+
$ oc create -f sriov-network.yaml
93+
----
94+
+
95+
.Example output
96+
[source,terminal]
97+
----
98+
sriovnetwork.sriovnetwork.openshift.io/sriov-numa-0-network created
99+
----
100+
101+
. Create a pod and assign the SR-IOV network resource from the previous step:
102+
103+
.. Save the following YAML in the `sriov-network-pod.yaml` file, replacing values in the YAML to match your environment:
104+
+
105+
[source,yaml]
106+
----
107+
apiVersion: v1
108+
kind: Pod
109+
metadata:
110+
name: <pod_name>
111+
annotations:
112+
k8s.v1.cni.cncf.io/networks: |-
113+
[
114+
{
115+
"name": "sriov-numa-0-network", <1>
116+
}
117+
]
118+
spec:
119+
containers:
120+
- name: <container_name>
121+
image: <image>
122+
imagePullPolicy: IfNotPresent
123+
command: ["sleep", "infinity"]
124+
----
125+
<1> This is the name of the `SriovNetwork` resource that uses the `SriovNetworkNodePolicy` resource.
126+
127+
.. Create the `Pod` resource by running the following command:
128+
+
129+
[source,terminal]
130+
----
131+
$ oc create -f sriov-network-pod.yaml
132+
----
133+
+
134+
.Example output
135+
[source,terminal]
136+
----
137+
pod/example-pod created
138+
----
139+
140+
.Verification
141+
142+
. Verify the status of the pod by running the following command, replacing `<pod_name>` with the name of the pod:
143+
+
144+
[source,terminal]
145+
----
146+
$ oc get pod <pod_name>
147+
----
148+
+
149+
.Example output
150+
[source,terminal]
151+
----
152+
NAME READY STATUS RESTARTS AGE
153+
test-deployment-sriov-76cbbf4756-k9v72 1/1 Running 0 45h
154+
----
155+
156+
. Open a debug session with the target pod to verify that the SR-IOV network resources are deployed to a different node than the memory and CPU resources.
157+
158+
.. Open a debug session with the pod by running the follow command, replacing <pod_name> with the target pod name.
159+
+
160+
[source,terminal]
161+
----
162+
$ oc debug pod/<pod_name>
163+
----
164+
165+
.. Set `/host` as the root directory within the debug shell. The debug pod mounts the root file system from the host in `/host` within the pod. By changing the root directory to `/host`, you can run binaries from the host file system:
166+
+
167+
[source,terminal]
168+
----
169+
$ chroot /host
170+
----
171+
172+
.. View information about the CPU allocation by running the following commands:
173+
+
174+
[source,terminal]
175+
----
176+
$ lscpu | grep NUMA
177+
----
178+
+
179+
.Example output
180+
[source,terminal]
181+
----
182+
NUMA node(s): 2
183+
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,...
184+
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,...
185+
----
186+
+
187+
[source,terminal]
188+
----
189+
$ cat /proc/self/status | grep Cpus
190+
----
191+
+
192+
.Example output
193+
[source,terminal]
194+
----
195+
Cpus_allowed: aa
196+
Cpus_allowed_list: 1,3,5,7
197+
----
198+
+
199+
[source,terminal]
200+
----
201+
$ cat /sys/class/net/net1/device/numa_node
202+
----
203+
+
204+
.Example output
205+
[source,terminal]
206+
----
207+
0
208+
----
209+
+
210+
In this example, CPUs 1,3,5, and 7 are allocated to `NUMA node1` but the SR-IOV network resource can use the NIC in `NUMA node0`.
211+
212+
[NOTE]
213+
====
214+
If the `excludeTopology` specification is set to `True`, it is possible that the required resources exist in the same NUMA node.
215+
====
216+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/hardware_networks/configuring-sriov-device.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="nw-sriov-exclude-topology-manager_{context}"]
7+
= Exclude the SR-IOV network topology for NUMA-aware scheduling
8+
9+
You can exclude advertising the Non-Uniform Memory Access (NUMA) node for the SR-IOV network to the Topology Manager for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.
10+
11+
In some scenarios, it is a priority to maximize CPU and memory resources for a pod on a single NUMA node. By not providing a hint to the Topology Manager about the NUMA node for the pod's SR-IOV network resource, the Topology Manager can deploy the SR-IOV network resource and the pod CPU and memory resources to different NUMA nodes. This can add to network latency because of the data transfer between NUMA nodes. However, it is acceptable in scenarios when workloads require optimal CPU and memory performance.
12+
13+
For example, consider a compute node, `compute-1`, that features two NUMA nodes: `numa0` and `numa1`. The SR-IOV-enabled NIC is present on `numa0`. The CPUs available for pod scheduling are present on `numa1` only. By setting the `excludeTopology` specification to `true`, the Topology Manager can assign CPU and memory resources for the pod to `numa1` and can assign the SR-IOV network resource for the same pod to `numa0`. This is only possible when you set the `excludeTopology` specification to `true`. Otherwise, the Topology Manager attempts to place all resources on the same NUMA node.

modules/nw-sriov-networknodepolicy-object.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ spec:
3535
isRdma: false <16>
3636
linkType: <link_type> <17>
3737
eSwitchMode: "switchdev" <18>
38+
excludeTopology: false <19>
3839
----
3940
<1> The name for the custom resource object.
4041

@@ -86,6 +87,8 @@ Do not set linkType to 'eth' for SriovNetworkNodePolicy, because this can lead t
8687

8788
<18> Optional: To enable hardware offloading, the 'eSwitchMode' field must be set to `"switchdev"`.
8889

90+
<19> Optional: To exclude advertising an SR-IOV network resource's NUMA node to the Topology Manager, set the value to `true`. The default value is `false`.
91+
8992
[id="sr-iov-network-node-configuration-examples_{context}"]
9093
== SR-IOV network node configuration examples
9194

modules/nw-sriov-topology-manager.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resou
1616
+
1717
[NOTE]
1818
====
19-
When `single-numa-node` is unable to satisfy the request, you can configure the Topology Manager policy to `restricted`.
19+
When `single-numa-node` is unable to satisfy the request, you can configure the Topology Manager policy to `restricted`. For more flexible SR-IOV network resource scheduling, see _Excluding SR-IOV network topology during NUMA-aware scheduling_ in the _Additional resources_ section.
2020
====
2121
2222
.Procedure

networking/hardware_networks/add-pod.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ include::modules/nw-openstack-sr-iov-testpmd-pod.adoc[leveloffset=+1]
2323
* xref:../../networking/hardware_networks/configuring-sriov-device.adoc#configuring-sriov-device[Configuring an SR-IOV Ethernet network attachment]
2424
* xref:../../networking/hardware_networks/configuring-sriov-ib-attach.adoc#configuring-sriov-ib-attach[Configuring an SR-IOV InfiniBand network attachment]
2525
* xref:../../scalability_and_performance/using-cpu-manager.adoc#using-cpu-manager[Using CPU Manager]
26+
* xref:../../networking/hardware_networks/configuring-sriov-device.adoc#nw-sriov-exclude-topology-manager_configuring-sriov-device[Exclude SR-IOV network topology for NUMA-aware scheduling]

networking/hardware_networks/configuring-sriov-device.adoc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,15 @@ include::modules/nw-sriov-troubleshooting.adoc[leveloffset=+1]
2626

2727
include::modules/cnf-assigning-a-sriov-network-to-a-vrf.adoc[leveloffset=+1]
2828

29+
include::modules/nw-sriov-exclude-topology-manager.adoc[leveloffset=+1]
30+
31+
include::modules/nw-sriov-configure-exclude-topology-manager.adoc[leveloffset=+2]
32+
33+
[role="_additional-resources"]
34+
.Additional resources
35+
36+
* xref:../../scalability_and_performance/using-cpu-manager.adoc#using-cpu-manager[Using CPU Manager]
37+
2938
[id="configuring-sriov-device-next-steps"]
3039
== Next steps
3140

0 commit comments

Comments
 (0)