Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions .github/workflows/release-chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: true

- name: Configure Git
run: |
Expand All @@ -24,7 +23,7 @@ jobs:
- name: Install Helm
uses: azure/setup-helm@v4

- name: Publish PyTorchJob Generator Helm Chart
- name: Run chart-releaser
uses: helm/[email protected]
with:
charts_dir: tools/pytorchjob-generator
Expand All @@ -33,15 +32,6 @@ jobs:
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

- name: Publish Sakkara Scheduler Helm Chart
uses: helm/[email protected]
with:
charts_dir: sakkara-deploy/install
packages_with_index: true
skip_existing: true
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"

publish:
needs: release
uses: project-codeflare/mlbatch/.github/workflows/gh-pages-static.yml@main
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,3 @@
path = scheduler-plugins
url = https://github.com/kubernetes-sigs/scheduler-plugins.git
branch = release-1.28
[submodule "sakkara-deploy"]
path = sakkara-deploy
url = [email protected]:atantawi/sakkara-deploy.git
1 change: 0 additions & 1 deletion sakkara-deploy
Submodule sakkara-deploy deleted from 909d3e
3 changes: 3 additions & 0 deletions tools/sakkara-deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
The helm/chart-installer-action does not understand git submodules.

Therfore we maintain a copy of https://github.com/atantawi/sakkara-deploy/tree/main/install/ here.
20 changes: 20 additions & 0 deletions tools/sakkara-deploy/release-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Release Instructions

1. Create a release prep branch

2. Update the version number in chart/Chart.yaml

3. Do a `helm unittest -u chart` and then run precommit to
regenerate the helmdocs. Inspect the diff and make sure
the only changes are the Chart version

4. Update the chart version number in the example
of `helm repo search` in the main README.md

5. Submit & merge a PR with these changes

6. Manually trigger the `Release Charts` workflow in the Actions
tab of the MLBatch GitHub project. This action will automatically
generate and push tags for the newly released chart and trigger an
update of the GH Pages (which contains the helm repo).

23 changes: 23 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
6 changes: 6 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
appVersion: v0.29.7
description: Deploy sakkara group and topology aware scheduler plugin in a cluster
name: sakkara-scheduler
type: application
version: 0.0.1
46 changes: 46 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# sakkara-scheduler

![Version: 0.0.1](https://img.shields.io/badge/Version-0.0.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: v0.29.7](https://img.shields.io/badge/AppVersion-v0.29.7-informational?style=flat-square)

Deploy sakkara group and topology aware scheduler plugin in a cluster

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| fullnameOverride | string | `""` | |
| image.repository | string | `"quay.io"` | repository to fetch images from |
| image.tag | string | `"v0.0.1"` | default is the chart appVersion |
| nameOverride | string | `"sakkara"` | |
| nodeSelector | object | `{}` | |
| pluginConfig[0].args.topologyConfigMapNameSpace | string | `"sakkara-scheduler"` | |
| pluginConfig[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.permit.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.postBind.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.postFilter.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.preEnqueue.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.preScore.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.queueSort.disabled[0].name | string | `"*"` | |
| plugins.queueSort.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.reserve.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.score.disabled[0].name | string | `"*"` | |
| plugins.score.enabled[0].name | string | `"ClusterTopologyPlacementGroup"` | |
| plugins.score.enabled[0].weight | int | `10` | |
| podAnnotations | object | `{}` | |
| priorityClassName | string | `"system-node-critical"` | |
| scheduler.affinity | object | `{}` | affinity for deployment's pods |
| scheduler.enabled | bool | `true` | deploy second scheduler as deployment |
| scheduler.image | string | `"ibm/sakkara-scheduler"` | path to scheduler image from repository |
| scheduler.imagePullPolicy | string | `"IfNotPresent"` | |
| scheduler.leaderElect | bool | `false` | enable for HA mode |
| scheduler.replicaCount | int | `1` | increase for HA mode |
| scheduler.resources | object | `{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"200m","memory":"512Mi"}}` | requests/limits for scheduler deployment resources: {} |
| scheduler.strategy.type | string | `"RollingUpdate"` | Deployment update strategy type |
| scheduler.verbosity | int | `6` | Log level from 1 to 9 |
| schedulerConfig.apiVersion | string | `"kubescheduler.config.k8s.io/v1"` | scheduler config apiversion (ref: https://kubernetes.io/docs/reference/scheduling/config/) |
| securityContext.privileged | bool | `false` | |
| tolerations | list | `[]` | |
| useForKubeSchedulerUser | bool | `false` | allow User system:kube-scheduler to work with metrics and CRDs. primary usage is to replace default-scheduler with custom one |

----------------------------------------------
Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2)
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubernetes-sigs/scheduler-plugins/pull/50
controller-gen.kubebuilder.io/version: v0.11.1
creationTimestamp: null
name: podgroups.scheduling.x-k8s.io
spec:
group: scheduling.x-k8s.io
names:
kind: PodGroup
listKind: PodGroupList
plural: podgroups
shortNames:
- pg
- pgs
singular: podgroup
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: PodGroup is a collection of Pod; used for batch workload.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: Specification of the desired behavior of the pod group.
properties:
minMember:
description: MinMember defines the minimal number of members/tasks
to run the pod group; if there's not enough resources to start all
tasks, the scheduler will not start anyone.
format: int32
type: integer
minResources:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: MinResources defines the minimal resource of members/tasks
to run the pod group; if there's not enough resources to start all
tasks, the scheduler will not start anyone.
type: object
scheduleTimeoutSeconds:
description: ScheduleTimeoutSeconds defines the maximal time of members/tasks
to wait before run the pod group;
format: int32
type: integer
type: object
status:
description: Status represents the current information about a pod group.
This data may not be up to date.
properties:
failed:
description: The number of pods which reached phase Failed.
format: int32
type: integer
occupiedBy:
description: OccupiedBy marks the workload (e.g., deployment, statefulset)
UID that occupy the podgroup. It is empty if not initialized.
type: string
phase:
description: Current phase of PodGroup.
type: string
running:
description: The number of actively running pods.
format: int32
type: integer
scheduleStartTime:
description: ScheduleStartTime of the group
format: date-time
type: string
succeeded:
description: The number of pods which reached phase Succeeded.
format: int32
type: integer
type: object
type: object
served: true
storage: true
subresources:
status: {}
51 changes: 51 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "scheduler-plugins.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "scheduler-plugins.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "scheduler-plugins.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "scheduler-plugins.labels" -}}
helm.sh/chart: {{ include "scheduler-plugins.chart" . }}
{{ include "scheduler-plugins.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "scheduler-plugins.selectorLabels" -}}
app.kubernetes.io/name: {{ include "scheduler-plugins.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
22 changes: 22 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/templates/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "scheduler-plugins.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "scheduler-plugins.labels" . | nindent 4 }}
data:
scheduler-config.yaml: |
apiVersion: {{ .Values.schedulerConfig.apiVersion }}
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: {{ .Values.scheduler.leaderElect }}
resourceName: {{ include "scheduler-plugins.fullname" . }}
profiles:
# Compose all plugins in one profile
- schedulerName: {{ include "scheduler-plugins.fullname" . }}
plugins:
{{- toYaml $.Values.plugins | nindent 8 }}
{{- if $.Values.pluginConfig }}
pluginConfig: {{ toYaml $.Values.pluginConfig | nindent 6 }}
{{- end }}
66 changes: 66 additions & 0 deletions tools/sakkara-deploy/sakkara-scheduler/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{{- if .Values.scheduler.enabled }}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "scheduler-plugins.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "scheduler-plugins.labels" . | nindent 4 }}
component: scheduler
spec:
replicas: {{ .Values.scheduler.replicaCount }}
{{- with .Values.scheduler.strategy }}
strategy:
{{- toYaml . | nindent 4 }}
{{- end }}
selector:
matchLabels:
{{- include "scheduler-plugins.selectorLabels" . | nindent 6 }}
component: scheduler
template:
metadata:
annotations:
checksum/configmap: '{{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}'
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "scheduler-plugins.selectorLabels" . | nindent 8 }}
component: scheduler
spec:
priorityClassName: {{ .Values.priorityClassName }}
serviceAccountName: {{ include "scheduler-plugins.fullname" . }}
containers:
- command:
- /bin/kube-scheduler
- --config=/etc/kubernetes/scheduler-config.yaml
- --v={{ .Values.scheduler.verbosity }}
name: scheduler
image: "{{ .Values.image.repository }}/{{ .Values.scheduler.image }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.scheduler.imagePullPolicy }}
resources:
{{- toYaml .Values.scheduler.resources | nindent 12 }}
securityContext:
{{- toYaml .Values.securityContext | nindent 12 }}
volumeMounts:
- name: scheduler-config
mountPath: /etc/kubernetes
readOnly: true
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.scheduler.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
volumes:
- name: scheduler-config
configMap:
name: {{ include "scheduler-plugins.fullname" . }}
{{- end }}
Loading