Skip to content

Commit 79d10b1

Browse files
authored
Sakkara documentation in SETUP for non-RHOAI clusters (#138)
1 parent df87009 commit 79d10b1

File tree

17 files changed

+175
-17
lines changed

17 files changed

+175
-17
lines changed

setup.RHOAI-v2.13/CLUSTER-SETUP.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,12 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.13/mlbatch-priorities.yaml
1111
```
1212

13-
## Coscheduler
13+
## Scheduler Plugins
14+
15+
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16+
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
17+
18+
### Coscheduler
1419

1520
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
1621
```sh
@@ -24,6 +29,8 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2
2429
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.13/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
2530
```
2631

32+
33+
2734
## Red Hat OpenShift AI
2835

2936
Create the Red Hat OpenShift AI subscription:

setup.RHOAI-v2.16/CLUSTER-SETUP.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,12 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.16/mlbatch-priorities.yaml
1111
```
1212

13-
## Coscheduler
13+
## Scheduler Plugins
14+
15+
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16+
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
17+
18+
### Coscheduler
1419

1520
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
1621
```sh
@@ -24,6 +29,8 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2
2429
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.16/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
2530
```
2631

32+
33+
2734
## Red Hat OpenShift AI
2835

2936
Create the Red Hat OpenShift AI subscription:

setup.RHOAI-v2.17/CLUSTER-SETUP.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,12 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1010
oc apply -f setup.RHOAI-v2.17/mlbatch-priorities.yaml
1111
```
1212

13-
## Coscheduler
13+
## Scheduler Plugins
14+
15+
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
16+
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
17+
18+
### Coscheduler
1419

1520
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
1621
```sh
@@ -24,6 +29,8 @@ oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2
2429
oc patch deployment -n scheduler-plugins --type=json --patch-file setup.RHOAI-v2.17/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
2530
```
2631

32+
33+
2734
## Red Hat OpenShift AI
2835

2936
Create the Red Hat OpenShift AI subscription:

setup.k8s/CLUSTER-SETUP.md

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Cluster Setup
22

33
The cluster setup installs and configures the following components:
4-
+ Coscheduler
4+
+ Scheduler Plugins
55
+ Kubeflow Training Operator
66
+ KubeRay
77
+ Kueue
@@ -16,7 +16,13 @@ Create `default-priority`, `high-priority`, and `low-priority` priority classes:
1616
kubectl apply -f setup.k8s/mlbatch-priorities.yaml
1717
```
1818

19-
## Coscheduler
19+
## Scheduler Plugins
20+
21+
MLBatch utilizes Kubernetes Scheduler Plugins to ensure gang scheduling of
22+
multi-Pod workloads and to pack `Pods` onto `Nodes` to reduce GPU fragmentation.
23+
Two options are described below: Coscheduler and Sakkara. You should pick and install one of them
24+
as a secondary scheduler for your cluster.
25+
### Coscheduler
2026

2127
Install Coscheduler v0.28.9 as a secondary scheduler and configure packing:
2228
```sh
@@ -30,6 +36,17 @@ kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s
3036
kubectl patch deployment -n scheduler-plugins --type=json --patch-file setup.k8s/coscheduler-priority-patch.yaml scheduler-plugins-scheduler
3137
```
3238

39+
### Sakkara
40+
41+
[Sakkara](https://github.com/atantawi/scheduler-plugins/tree/sakkara) is an experimental
42+
new scheduler plugin with advanced support for topology-aware scheduling.
43+
44+
Install Sakkara as a secondary scheduler:
45+
```sh
46+
helm install sakkara-scheduler --namespace sakkara-scheduler --create-namespace mlbatch/sakkara-scheduler
47+
```
48+
Optionally, create a config map capturing your cluster's topology as described in the [Sakkara documentation](https://github.com/atantawi/sakkara-deploy/tree/main?tab=readme-ov-file#cluster-topology). This step is optional but recommended for production clusters. If the config map is not present Sakkara will default to a single-level hierarchy containing the Nodes of the cluster.
49+
3350
## Install Operators
3451

3552
Create the mlbatch-system namespace
@@ -38,8 +55,14 @@ kubectl create namespace mlbatch-system
3855
```
3956

4057
Install the Kubeflow Training Operator
58+
59+
If you are using Coscheduler do:
60+
```sh
61+
kubectl apply --server-side -k setup.k8s/training-operator/coscheduler
62+
```
63+
If you are using Sakkara do:
4164
```sh
42-
kubectl apply --server-side -k setup.k8s/training-operator
65+
kubectl apply --server-side -k setup.k8s/training-operator/sakkara
4366
```
4467

4568
Install the KubeRay Operator
@@ -53,13 +76,19 @@ kubectl apply --server-side -k setup.k8s/kueue
5376
```
5477

5578
Install the AppWrapper Operator
79+
If you are using Coscheduler do:
5680
```sh
57-
kubectl apply --server-side -k setup.k8s/appwrapper
81+
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduler
5882
```
83+
If you are using Sakkara do:
84+
```sh
85+
kubectl apply --server-side -k setup.k8s/appwrapper/sakkara
86+
```
87+
5988
The provided configuration differs from the default configuration of the
6089
operators as follows:
6190
- Kubeflow Training Operator:
62-
- `gang-scheduler-name` is set to `scheduler-plugins-scheduler`,
91+
- `gang-scheduler-name` is set to either `scheduler-plugins-scheduler` or `sakkara-scheduler`,
6392
- Kueue:
6493
- `batch/job` integration is disabled,
6594
- `manageJobsWithoutQueueName` is enabled and configured via `managedJobsNamespaceSelector` to be
@@ -70,7 +99,7 @@ operators as follows:
7099
- `enableClusterQueueResources` metrics is enabled,
71100
- AppWrapper operator:
72101
- `userRBACAdmissionCheck` is disabled,
73-
- `schedulerName` is set to `scheduler-plugins-scheduler`,
102+
- `schedulerName` is set to `scheduler-plugins-scheduler` or `sakkara-scheduler`,
74103
- `queueName` is set to `default-queue`,
75104
- pod priorities, resource requests and limits have been adjusted.
76105

setup.k8s/UNINSTALL.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,8 @@ kubectl delete clusterrole mlbatch-edit
2020
# Coscheduler uninstall
2121
helm uninstall -n scheduler-plugins scheduler-plugins
2222
kubectl delete namespace scheduler-plugins
23+
24+
# Sakkara uninstall
25+
helm uninstall -n sakkara-scheduler sakkara-scheduler
26+
kubectl delete namespace sakkara-scheduler
2327
```

setup.k8s/appwrapper/kustomization.yaml renamed to setup.k8s/appwrapper/base/kustomization.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,5 @@ images:
1717
newTag: v0.30.0
1818

1919
patches:
20-
- path: config_patch.yaml
2120
- path: manager_resources_patch.yaml
2221
- path: remove_default_namespace.yaml
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
namespace: mlbatch-system
4+
5+
resources:
6+
- ../base
7+
8+
patches:
9+
patches:
10+
- path: config_patch.yaml

0 commit comments

Comments
 (0)