Skip to content

Commit a56ce83

Browse files
committed
Sakkara setup for training-operator and appwrapper
1 parent bb2bd5d commit a56ce83

File tree

13 files changed

+105
-10
lines changed

13 files changed

+105
-10
lines changed

setup.k8s/CLUSTER-SETUP.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,14 @@ kubectl create namespace mlbatch-system
5555
```
5656

5757
Install the Kubeflow Training Operator
58+
59+
If you are using Coscheduler do:
60+
```sh
61+
kubectl apply --server-side -k setup.k8s/training-operator/coscheduler
62+
```
63+
If you are using Sakkara do:
5864
```sh
59-
kubectl apply --server-side -k setup.k8s/training-operator
65+
kubectl apply --server-side -k setup.k8s/training-operator/sakkara
6066
```
6167

6268
Install the KubeRay Operator
@@ -70,13 +76,19 @@ kubectl apply --server-side -k setup.k8s/kueue
7076
```
7177

7278
Install the AppWrapper Operator
79+
If you are using Coscheduler do:
7380
```sh
74-
kubectl apply --server-side -k setup.k8s/appwrapper
81+
kubectl apply --server-side -k setup.k8s/appwrapper/coscheduler
7582
```
83+
If you are using Sakkara do:
84+
```sh
85+
kubectl apply --server-side -k setup.k8s/appwrapper/sakkara
86+
```
87+
7688
The provided configuration differs from the default configuration of the
7789
operators as follows:
7890
- Kubeflow Training Operator:
79-
- `gang-scheduler-name` is set to `scheduler-plugins-scheduler`,
91+
- `gang-scheduler-name` is set to either `scheduler-plugins-scheduler` or `sakkara-scheduler`,
8092
- Kueue:
8193
- `batch/job` integration is disabled,
8294
- `manageJobsWithoutQueueName` is enabled and configured via `managedJobsNamespaceSelector` to be
@@ -87,7 +99,7 @@ operators as follows:
8799
- `enableClusterQueueResources` metrics is enabled,
88100
- AppWrapper operator:
89101
- `userRBACAdmissionCheck` is disabled,
90-
- `schedulerName` is set to `scheduler-plugins-scheduler`,
102+
- `schedulerName` is set to `scheduler-plugins-scheduler` or `sakkara-scheduler`,
91103
- `queueName` is set to `default-queue`,
92104
- pod priorities, resource requests and limits have been adjusted.
93105

setup.k8s/appwrapper/kustomization.yaml renamed to setup.k8s/appwrapper/base/kustomization.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,5 @@ images:
1717
newTag: v0.30.0
1818

1919
patches:
20-
- path: config_patch.yaml
2120
- path: manager_resources_patch.yaml
2221
- path: remove_default_namespace.yaml
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
namespace: mlbatch-system
4+
5+
resources:
6+
- ../base
7+
8+
patches:
9+
patches:
10+
- path: config_patch.yaml
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
kind: ConfigMap
2+
apiVersion: v1
3+
metadata:
4+
name: appwrapper-operator-config
5+
namespace: appwrapper-system
6+
data:
7+
config.yaml: |
8+
appwrapper:
9+
enableKueueIntegrations: true
10+
kueueJobReconciller:
11+
manageJobsWithoutQueueName: true
12+
waitForPodsReady:
13+
enable: false
14+
defaultQueueName: default-queue
15+
schedulerName: sakkara-scheduler
16+
slackQueueName: slack-cluster-queue
17+
userRBACAdmissionCheck: false
18+
controllerManager:
19+
health:
20+
bindAddress: ":8081"
21+
metrics:
22+
bindAddress: "127.0.0.1:8080"
23+
leaderElection: true
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
namespace: mlbatch-system
4+
5+
resources:
6+
- ../base
7+
8+
patches:
9+
patches:
10+
- path: config_patch.yaml

setup.k8s/training-operator/manager_resources_patch.yaml renamed to setup.k8s/training-operator/base/manager_resources_patch.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ spec:
1010
- name: training-operator
1111
args:
1212
- "--zap-log-level=2"
13-
- "--gang-scheduler-name=scheduler-plugins-scheduler"
1413
resources:
1514
requests:
1615
cpu: 100m

0 commit comments

Comments
 (0)