You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/using-rdma-network-locality-when-running-workloads-on-oke.md
+103-1Lines changed: 103 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -173,7 +173,7 @@ spec:
173
173
174
174
```
175
175
176
-
### Using `kueue`
176
+
### Using Kueue
177
177
You will need to [enable the feature gate](https://kueue.sigs.k8s.io/docs/installation/#change-the-feature-gates-configuration) for [Topology Aware Scheduling (TAS)](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling) in Kueue. Topology Aware Scheduling is currently in alpha state since Kueue v0.9.
178
178
179
179
The following example uses `node.kubernetes.io/instance-type: "BM.GPU.H100.8"` to select H100s, but you can use any label that exists on all your nodes that you're targeting with the Resource Flavor.
@@ -263,4 +263,106 @@ spec:
263
263
restartPolicy: Never
264
264
```
265
265
266
+
### Using Node Ordering script as an Init Container with MPI Operator
267
+
If your workload can use an ordered list of hosts or a rankfile (e.g. `mpirun`), you can use the Python script to generate that file using an Init Container and then use the generated ordered host list or rankfile in your job.
268
+
269
+
The script creates the files using the same information available in instance metadata service.
0 commit comments