Skip to content

Commit 2d4f9a8

Browse files
authored
more accurate description of scheduling customizations (#140)
1 parent f240b2e commit 2d4f9a8

File tree

12 files changed

+77
-57
lines changed

12 files changed

+77
-57
lines changed

setup.RHOAI-v2.13/CLUSTER-SETUP.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.13/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.RHOAI-v2.16/CLUSTER-SETUP.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.16/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.RHOAI-v2.17/CLUSTER-SETUP.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Cluster Setup
22

3-
The cluster setup installs Red Hat OpenShift AI and Coscheduler, configures Kueue,
3+
The cluster setup installs Red Hat OpenShift AI and configures Scheduler Plugins, Kueue,
44
cluster roles, and priority classes.
55

66
## Priorities
@@ -10,23 +10,26 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.17/mlbatch-priorities.yaml
1111
```
1212

13-
## Scheduler Plugins
13+
## Scheduler Configuration
1414

15-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
15+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
16+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
17+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
18+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
19+
20+
This is done by installing the Coscheduling out-of-tree scheduler plugin and configuring
21+
the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
1722

18-
### Coscheduler
1923

20-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2124
```sh
2225
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
2326
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
2427
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
2528
```
26-
Patch Coscheduler pod priorities:
29+
Patch scheduler-plugins pod priorities:
2730
```sh
28-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-controller
29-
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
31+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/scheduler-priority-patch.yaml scheduler-plugins-controller
32+
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3033
```
3134

3235

setup.k8s/CLUSTER-SETUP.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,28 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1616
kubectl apply -f setup.k8s/mlbatch-priorities.yaml
1717
```
1818

19-
## Scheduler Plugins
19+
## Scheduler Configuration
20+
21+
MLBatch configures Kubernetes scheduling to accomplish two objectives:
22+
+ Obtaining gang (all or nothing) scheduling for multi-Pod workloads.
23+
+ Packing Pods whose GPU request is less than the number of GPUs on a Node to
24+
maximize the number of Nodes available for Pods that request all the GPUs on a Node.
25+
26+
The currently recommend way to do this is by installing the Coscheduling out-of-tree scheduler
27+
plugin and configuring the default NodeResourcesFit scheduler plugin to pack in the GPU dimension.
28+
Alternatively, you can skip the helm install and patch commands shown below and instead install
29+
the experimental Sakkara scheduler plugin (described next).
2030

21-
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
22-
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
23-
Two options are described below: Coscheduler and Sakkara. You should pick and install one of them
24-
as a secondary scheduler for your cluster.
25-
### Coscheduler
2631

27-
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2832
```sh
2933
helm install scheduler-plugins --namespace scheduler-plugins --create-namespace \
3034
scheduler-plugins/manifests/install/charts/as-a-second-scheduler/ \
3135
--set-json pluginConfig='[{"args":{"scoringStrategy":{"resources":[{"name":"nvidia.com/gpu","weight":1}],"requestedToCapacityRatio":{"shape":[{"utilization":0,"score":0},{"utilization":100,"score":10}]},"type":"RequestedToCapacityRatio"}},"name":"NodeResourcesFit"},{"args":{"permitWaitingTimeSeconds":300},"name":"Coscheduling"}]'
3236
```
33-
Patch Coscheduler pod priorities:
37+
Patch scheduler-plugins pod priorities:
3438
```sh
35-
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-controller
36-
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
39+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/scheduler-priority-patch.yaml scheduler-plugins-controller
40+
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/scheduler-priority-patch.yaml scheduler-plugins-scheduler
3741
```
3842

3943
### Sakkara
@@ -56,9 +60,9 @@ kubectl create namespace mlbatch-system
5660

5761
Install the Kubeflow Training Operator
5862

59-
If you are using Coscheduler do:
63+
If you are using Coscheduling do:
6064
```sh
61-
kubectl apply --server-side -k setup.k8s/training-operator/coscheduler
65+
kubectl apply --server-side -k setup.k8s/training-operator/coscheduling
6266
```
6367
If you are using Sakkara do:
6468
```sh
@@ -76,9 +80,9 @@ kubectl apply --server-side -k setup.k8s/kueue
7680
```
7781

7882
Install the AppWrapper Operator
79-
If you are using Coscheduler do:
83+
If you are using Coscheduling do:
8084
```sh
81-
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduler
85+
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduling
8286
```
8387
If you are using Sakkara do:
8488
```sh

0 commit comments

Comments
 (0)