Skip to content

Commit 812ad79

Browse files
Merge branch 'skyler/release-1.0' into 'main'
Release v1.0.0-rc1 Closes #123 See merge request SchedMD/slinky-dev/slurm-operator!241
2 parents cb2862d + f48fb94 commit 812ad79

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+125
-6571
lines changed

.gitlab-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ stages:
1414
- release
1515

1616
variables:
17-
VERSION: 0.4.0
17+
VERSION: 1.0.0-rc1
1818
CI_JOB_USER: gitlab-ci-token
1919
DOCKER_TLS_CERTDIR: '' # disable TLS
2020
DOCKER_OPTS: ${DOCKER_OPTS} --registry-mirror=https://mirror.gcr.io

.gitlab/ci/release.yml

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ release-oci:
4343
- git remote set-url origin ${CI_PROJECT_URL/gitlab.com/oauth2:${CI_AUTH_TOKEN}@gitlab.com}.git
4444
- git remote -v
4545
- |
46-
if [ -z "$(echo "$VERSION" | grep -Eo "^[0-9]+\.[0-9]+\.[0-9]+$")" ]; then
46+
if [ -z "$(echo "$VERSION" | grep -Eo "^[0-9]+\.[0-9]+\.[0-9](-[[:alpha:]][[:alnum:]]*(\.[0-9]+)?)?$")" ]; then
4747
echo "VERSION is not semver: `$VERSION`"
4848
exit 1
4949
fi
@@ -60,17 +60,3 @@ release-tag:
6060
rules:
6161
- if: $CI_COMMIT_REF_PROTECTED == "true"
6262
when: manual
63-
64-
release-branch:
65-
stage: release
66-
extends: .git
67-
script:
68-
- set -euo pipefail
69-
- major_minor="$(echo ${VERSION} | grep -Eo "^[0-9]+\.[0-9]+")"
70-
- branch_name="release-${major_minor}"
71-
- echo "branch_name=${branch_name}"
72-
- git branch ${branch_name}
73-
- git push --set-upstream origin ${branch_name}
74-
rules:
75-
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
76-
when: manual

CHANGELOG.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
All notable changes to this project will be documented in version specific
44
files.
55

6-
- [CHANGELOG-0.1.md](./CHANGELOG/CHANGELOG-0.1.md)
7-
- [CHANGELOG-0.2.md](./CHANGELOG/CHANGELOG-0.2.md)
8-
- [CHANGELOG-0.3.md](./CHANGELOG/CHANGELOG-0.3.md)
6+
- [CHANGELOG-1.0.md](./CHANGELOG/CHANGELOG-1.0.md)
97
- [CHANGELOG-0.4.md](./CHANGELOG/CHANGELOG-0.4.md)
8+
- [CHANGELOG-0.3.md](./CHANGELOG/CHANGELOG-0.3.md)
9+
- [CHANGELOG-0.2.md](./CHANGELOG/CHANGELOG-0.2.md)
10+
- [CHANGELOG-0.1.md](./CHANGELOG/CHANGELOG-0.1.md)

CHANGELOG/CHANGELOG-1.0.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
## v1.0.0-rc1
2+
3+
### Added
4+
5+
- The Slurm Helm Chart can now be configured with `PrologSlurmctld` and
6+
`EpilogSlurmctld`.
7+
- Add arm64 support and multiarch manifest.
8+
- Added NodePort to `v1alpha1.ServiceSpec`.
9+
- Added pod hostname resolution of NodeSet pods.
10+
- Adds hostname label to pods for Slurm node mapping.
11+
- Synchronize Kubernetes node [un]cordon state to NodeSet pods and their Slurm
12+
nodes. When Kubernetes nodes are cordoned, NodeSet pods running on those nodes
13+
are also cordoned and their Slurm nodes drained. Those NodeSet pods remain
14+
cordoned until the Kubernetes node becomes uncordoned.
15+
- Implements graceful nodeset pod disruption handling
16+
- Added metrics-server-bind-address command-line option for the slurm-operator
17+
controller.
18+
- Added liveness probe to slurmrestd container, which will restart its pod if it
19+
becomes unresponsive long enough.
20+
- Custom Slurm node drain message for kubectl cordon.
21+
- Adds dynamic node tainting.
22+
- Can now support hybrid clusters, where one or more Slurm components exist
23+
externally to Kubernetes but be joined to the same Slurm cluster.
24+
25+
### Fixed
26+
27+
- Fixes parsing of `ServiceSpec` via `ServiceSpecWrapper`.
28+
- Correctly use global imagePullPolicy as the default value for all containers.
29+
- Determine cluster domain instead of assuming the default (`cluster.local`).
30+
- Update kubeVersion parsing to handle provider suffixes (e.g., GKE
31+
`x.y.z-gke.a`).
32+
- Fixed odd number of arguments logger error when updating pod conditions.
33+
- Avoid needless NotFound errors when patching pod conditions.
34+
- Fixed regression where nodeset `partition.enabled` was not being respected.
35+
- Initial NodeSet no longer accidentally owns the worker service.
36+
- Fixed issue where changes to slurmd and/or logfile subobjects where not
37+
causing a rolling update.
38+
- Fixed notation used to refer to LoginSets in installation docs.
39+
- Fixed documentation for uninstalling slurm-operator-crds.
40+
- When checking if a Slurm node is fully drained, the logic now follows closely
41+
to how Slurm represents the drained state. There were certain edge cases that
42+
could alleged the node was not drained when it actually was.
43+
- Check if Slurm node is [un]drain before requesting the opposite. This avoids a
44+
race condition where an admin or script has applied [un]drain to the Slurm
45+
node but the operator is not aware of it.
46+
- When Slurm nodes are put into drain state, the provided reason should not be
47+
thrashed by subsequent drain requests.
48+
- Fixed installation instruction for cert-manager chart.
49+
- Fixes bug wereby slurm-controller hostname was set incorrectly.
50+
- Fixes per-nodeset partition creation.
51+
- Fixed chart installation failure where NOTES.txt failed to fetch value from
52+
nested object where the parent was null.
53+
- Fixed imagePullPolicy in slurm-operator Helm chart.
54+
- Fixes edge case where Slurm node state is not reset when a worker pod migrates
55+
kube nodes.
56+
- Reduce checksum collision during file change detection by using SHA256 instead
57+
of MD5.
58+
- When `CgroupPlugin=disabled`, do not configure `PrologFlags=Contain` and other
59+
parameters that depend on it.
60+
- Added liveness probe to slurmd container to restart the pod if slurmd crashes
61+
after starting.
62+
- Prevent Slurm node undrain when node is down or notresponding.
63+
- Fixed reason prefixing behavior in MakeNodeUndrain.
64+
- Default webhook timeout is now consistent across all endpoints, respecing the
65+
user input, otherwise using the Kubernetes default.
66+
- Fixed case where multiple env variables in LoginSet would cause the operator
67+
to keep updating the LoginSet Deployment causing the underlying ReplicaSet to
68+
endlessly thrash.
69+
- Fixed case where NodeSets being added or removed from the Slurm cluster was
70+
not triggering a reconfigure.
71+
72+
### Changed
73+
74+
- Organized documentation into sub-directories.
75+
- Updates the paths used to refer to the user's home directory in installation
76+
instructions.
77+
- Slurm node [un]drain activity now includes more context.
78+
- Made the NodeSet updateStrategy configurable in the Slurm helm chart. The
79+
default minUnavailable was changed to 25%.
80+
- Shortened naming schema for health and metrics addresses.
81+
- Exposed addresses for health and metrics of the slurm-operator controller pod
82+
via the Helm chart.
83+
- slurmctld - The reconfigure container is now a sidecar instead of main
84+
container.
85+
- Reduced interval of the reconfigure check. After the kubelet updates mounted
86+
files in the pod, a reconfigure will be issued more quickly.
87+
- All supplemental containers are now `corev1.Container`, allowing full
88+
configuration.
89+
- Chart metadata is no longer applied to the pod template.
90+
- Updated NodeSet pod preStop to better indicate why the Slurm node was set to
91+
DOWN before deletion.
92+
- Service metadata is now configurable separately from the pod template
93+
metadata.
94+
- Webhooks avoid kube-system namespace.
95+
- Replaced slurm-exporter with a serviceMonitor that scrapes slurmctld directly.
96+
- Move to Slurm v44 API (from v43).
97+
98+
### Removed
99+
100+
- Removed defaulting webhooks.
101+
- Removed v1alpha1 CRDs to cleanly delineate v1 from v0 releases. Going forward,
102+
old versions of CRDs in v1 releases will linger in a deprecated state and be
103+
removed in future releases as needed.

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# To re-generate a bundle for another specific version without changing the standard setup, you can:
66
# - use the VERSION as arg of the build target (e.g make build VERSION=0.0.2)
77
# - use environment variables to overwrite this value (e.g export VERSION=0.0.2)
8-
VERSION ?= 0.4.0
8+
VERSION ?= 1.0.0-rc1
99

1010
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
1111
ifeq (,$(shell go env GOBIN))

api/v1alpha1/accounting_convert.go

Lines changed: 0 additions & 34 deletions
This file was deleted.

api/v1alpha1/accounting_keys.go

Lines changed: 0 additions & 100 deletions
This file was deleted.

0 commit comments

Comments
 (0)