Skip to content

Commit cde0286

Browse files
authored
feat: Update Kueue to v0.14.3 (#804)
* feat: Add --quiet flag * Bump default Kueue version to 0.14.2 * feat: Prepare for Kueue upgrade * Add user_input_test.py * testing: CommandsTester updates * refactor: Simplify kueue_manager_test with CommandsTester * Add user_input docs * Add unit tests * Kueue v0.14.3 * Apply review feedback * Fix import * Update kueue deletion prompt
1 parent d17876d commit cde0286

File tree

12 files changed

+300
-18
lines changed

12 files changed

+300
-18
lines changed

.github/actions/install-kueue/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ inputs:
1919
version:
2020
description: "The version to install"
2121
required: false
22-
default: "0.14.2"
22+
default: "0.14.3"
2323

2424
runs:
2525
using: composite

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
KUEUE_REPO=https://github.com/kubernetes-sigs/kueue.git
22

33
KUBECTL_VERSION := $(shell curl -L -s https://dl.k8s.io/release/stable.txt)
4-
KUEUE_VERSION=v0.12.2
4+
KUEUE_VERSION=v0.14.3
55
KJOB_VERSION=v0.1.0
66

77
OS := $(shell uname -s | tr A-Z a-z)

goldens/Basic_cluster_create.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,10 @@ kubectl apply --server-side -f https://github.com/google/pathways-job/releases/d
7474
[XPK] Enabling Kueue on the cluster
7575
[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run.
7676
kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}'
77-
[XPK] Installing Kueue version v0.12.2...
77+
[XPK] Installing Kueue version v0.14.3...
7878
[XPK] Try 1: Install Kueue
7979
[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run.
80-
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
80+
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml
8181
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
8282
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
8383
[XPK] Applying following Kueue resources:

goldens/Cluster_create_private.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,10 @@ kubectl apply --server-side -f https://github.com/google/pathways-job/releases/d
7979
[XPK] Enabling Kueue on the cluster
8080
[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run.
8181
kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}'
82-
[XPK] Installing Kueue version v0.12.2...
82+
[XPK] Installing Kueue version v0.14.3...
8383
[XPK] Try 1: Install Kueue
8484
[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run.
85-
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
85+
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml
8686
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
8787
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
8888
[XPK] Applying following Kueue resources:

goldens/Cluster_create_with_gb200-4.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,10 @@ kubectl apply --server-side -f https://github.com/google/pathways-job/releases/d
7878
[XPK] Enabling Kueue on the cluster
7979
[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run.
8080
kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}'
81-
[XPK] Installing Kueue version v0.12.2...
81+
[XPK] Installing Kueue version v0.14.3...
8282
[XPK] Try 1: Install Kueue
8383
[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run.
84-
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
84+
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml
8585
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
8686
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
8787
[XPK] Applying following Kueue resources:

goldens/NAP_cluster-create.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,10 @@ kubectl apply --server-side -f https://github.com/google/pathways-job/releases/d
8585
[XPK] Enabling Kueue on the cluster
8686
[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run.
8787
kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}'
88-
[XPK] Installing Kueue version v0.12.2...
88+
[XPK] Installing Kueue version v0.14.3...
8989
[XPK] Try 1: Install Kueue
9090
[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run.
91-
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
91+
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml
9292
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
9393
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
9494
[XPK] Applying following Kueue resources:

goldens/NAP_cluster-create_with_pathways.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,10 @@ kubectl apply --server-side -f https://github.com/google/pathways-job/releases/d
8686
[XPK] Enabling Kueue on the cluster
8787
[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run.
8888
kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}'
89-
[XPK] Installing Kueue version v0.12.2...
89+
[XPK] Installing Kueue version v0.14.3...
9090
[XPK] Try 1: Install Kueue
9191
[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run.
92-
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
92+
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml
9393
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
9494
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
9595
[XPK] Applying following Kueue resources:

src/xpk/core/kueue_manager.py

Lines changed: 64 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
from typing import Optional, List, Dict, Any
2121
import json
2222
from jinja2 import Environment, FileSystemLoader
23+
24+
from ..utils.user_input import ask_for_user_consent
2325
from ..utils.execution_context import is_dry_run
2426
from ..utils.kueue import is_queued_cluster
2527

@@ -42,6 +44,8 @@
4244
from ..utils.templates import TEMPLATE_PATH, get_templates_absolute_path
4345
from packaging.version import Version
4446

47+
KUEUE_VERSION = Version("v0.14.3")
48+
LATEST_BREAKING_VERSION = Version("v0.14.0")
4549
WAIT_FOR_KUEUE_TIMEOUT = "10m"
4650
CLUSTER_QUEUE_NAME = "cluster-queue"
4751
LOCAL_QUEUE_NAME = "multislice-queue"
@@ -52,7 +56,6 @@
5256
KUEUE_SUB_SLICING_TOPOLOGY_JINJA_FILE = "kueue_sub_slicing_topology.yaml.j2"
5357
MEMORY_SIZE_PER_VM = 1.2
5458
MIN_MEMORY_LIMIT_SIZE = 4096
55-
KUEUE_VERSION = Version("v0.12.2")
5659
SUB_SLICING_TOPOLOGIES = ["2x2", "2x4", "4x4", "4x8", "8x8", "8x16", "16x16"]
5760

5861

@@ -105,15 +108,19 @@ def install_or_upgrade(
105108
"""
106109
return_code, installed_version = self.get_installed_kueue_version()
107110

108-
if return_code == 0:
109-
if installed_version and installed_version > self.kueue_version:
111+
if return_code == 0 and installed_version:
112+
if installed_version > self.kueue_version:
110113
xpk_print(
111114
f"Cluster has a newer Kueue version, {installed_version}. Skipping"
112115
" installation."
113116
)
114117
return 0
115118
else:
116119
xpk_print(f"Upgrading Kueue to version v{self.kueue_version}...")
120+
assert installed_version
121+
prepare_code = self.__prepare_for_upgrade(installed_version)
122+
if prepare_code != 0:
123+
return prepare_code
117124
else:
118125
xpk_print(f"Installing Kueue version v{self.kueue_version}...")
119126

@@ -162,6 +169,60 @@ def __install(
162169

163170
return self.__wait_for_kueue_available()
164171

172+
def __prepare_for_upgrade(self, installed_version: Version) -> int:
173+
if installed_version >= LATEST_BREAKING_VERSION:
174+
return 0
175+
176+
xpk_print(
177+
f"Currently installed Kueue version v{installed_version} is"
178+
f" incompatible with the newer v{self.kueue_version}."
179+
)
180+
181+
changelog_link = f"https://github.com/kubernetes-sigs/kueue/blob/main/CHANGELOG/CHANGELOG-{self.kueue_version.major}.{self.kueue_version.minor}.md"
182+
agreed = ask_for_user_consent(
183+
"Do you want to allow XPK to update Kueue automatically? This will"
184+
" delete all existing Kueue resources and create new ones. If you"
185+
" decline, you will need to upgrade the Kueue manually (see"
186+
f" {changelog_link} for help)."
187+
)
188+
if not agreed:
189+
return 1
190+
191+
return self.__delete_all_kueue_resources()
192+
193+
def __delete_all_kueue_resources(self) -> int:
194+
return_code, kueue_crds_string = run_command_for_value(
195+
"kubectl get crd -o name | grep .kueue.x-k8s.io", "Get Kueue CRDs"
196+
)
197+
if return_code != 0:
198+
return return_code
199+
200+
kueue_crds = [
201+
line.strip().removeprefix(
202+
"customresourcedefinition.apiextensions.k8s.io/"
203+
)
204+
for line in kueue_crds_string.strip().split("\n")
205+
]
206+
207+
for crd in kueue_crds:
208+
return_code = run_command_with_updates(
209+
f"kubectl delete {crd} --all", f"Delete all resources of type {crd}"
210+
)
211+
if return_code != 0:
212+
return return_code
213+
214+
for crd in kueue_crds:
215+
return_code = run_command_with_updates(
216+
f"kubectl delete crd {crd}", f"Delete CRD {crd}"
217+
)
218+
if return_code != 0:
219+
return return_code
220+
221+
return run_command_with_updates(
222+
"kubectl delete deployment kueue-controller-manager -n kueue-system",
223+
"Delete Kueue Controller Manager deployment",
224+
)
225+
165226
def __install_kueue_crs(self) -> int:
166227
manifest_url = f"https://github.com/kubernetes-sigs/kueue/releases/download/v{self.kueue_version}/manifests.yaml"
167228
install_command = (

src/xpk/core/kueue_manager_test.py

Lines changed: 83 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,13 @@ def set_installed_kueue_version(
6161
)
6262

6363

64+
@pytest.fixture(autouse=True)
65+
def mock_ask_for_user_consent(mocker: MockerFixture) -> MagicMock:
66+
return mocker.patch(
67+
"xpk.core.kueue_manager.ask_for_user_consent", return_value=True
68+
)
69+
70+
6471
@pytest.fixture(autouse=True)
6572
def mock_commands(mocker: MockerFixture) -> CommandsTester:
6673
return CommandsTester(
@@ -102,7 +109,7 @@ def test_install_or_upgrade_when_outdated(
102109
result = kueue_manager.install_or_upgrade(KUEUE_CONFIG)
103110

104111
assert result == 0
105-
mock_commands.assert_command_run("kubectl apply", "v0.12.2/manifests.yaml")
112+
mock_commands.assert_command_run("kubectl apply", "v0.14.3/manifests.yaml")
106113
mock_commands.assert_command_run("kubectl apply -f", "/tmp/")
107114

108115

@@ -115,10 +122,84 @@ def test_install_or_upgrade_when_not_installed(
115122
result = kueue_manager.install_or_upgrade(KUEUE_CONFIG)
116123

117124
assert result == 0
118-
mock_commands.assert_command_run("kubectl apply", "v0.12.2/manifests.yaml")
125+
mock_commands.assert_command_run("kubectl apply", "v0.14.3/manifests.yaml")
119126
mock_commands.assert_command_run("kubectl apply -f", "/tmp/")
120127

121128

129+
def test_upgrade_when_no_breaking_changes_between_versions_no_preparation_needed(
130+
mock_commands: CommandsTester,
131+
kueue_manager: KueueManager,
132+
mock_ask_for_user_consent: MagicMock,
133+
):
134+
set_installed_kueue_version(mock_commands, Version("0.14.0"))
135+
136+
kueue_manager.install_or_upgrade(KUEUE_CONFIG)
137+
138+
mock_ask_for_user_consent.assert_not_called()
139+
140+
141+
def test_upgrade_with_breaking_changes_between_versions_runs_preparation(
142+
mock_commands: CommandsTester,
143+
kueue_manager: KueueManager,
144+
mock_ask_for_user_consent: MagicMock,
145+
):
146+
set_installed_kueue_version(mock_commands, Version("0.11.0"))
147+
fake_crds = (
148+
"customresourcedefinition.apiextensions.k8s.io/kueue-crd-1.kueue.x-k8s.io\n"
149+
"customresourcedefinition.apiextensions.k8s.io/kueue-crd-2.kueue.x-k8s.io"
150+
)
151+
mock_commands.set_result_for_command(
152+
(0, fake_crds), "kubectl get crd -o name"
153+
)
154+
mock_ask_for_user_consent.return_value = True
155+
156+
result = kueue_manager.install_or_upgrade(KUEUE_CONFIG)
157+
158+
assert result == 0
159+
mock_ask_for_user_consent.assert_called_once()
160+
assert (
161+
"CHANGELOG/CHANGELOG-0.14.md"
162+
in mock_ask_for_user_consent.mock_calls[0].args[0]
163+
)
164+
mock_commands.assert_command_run(
165+
"kubectl delete kueue-crd-1.kueue.x-k8s.io --all"
166+
)
167+
mock_commands.assert_command_run(
168+
"kubectl delete kueue-crd-2.kueue.x-k8s.io --all"
169+
)
170+
mock_commands.assert_command_run(
171+
"kubectl delete crd kueue-crd-1.kueue.x-k8s.io"
172+
)
173+
mock_commands.assert_command_run(
174+
"kubectl delete crd kueue-crd-2.kueue.x-k8s.io"
175+
)
176+
mock_commands.assert_command_run(
177+
"kubectl delete deployment kueue-controller-manager"
178+
)
179+
180+
181+
def test_upgrade_with_breaking_changes_between_versions_does_not_run_preparation_without_consent(
182+
mock_commands: CommandsTester,
183+
kueue_manager: KueueManager,
184+
mock_ask_for_user_consent: MagicMock,
185+
):
186+
set_installed_kueue_version(mock_commands, Version("0.11.0"))
187+
mock_commands.set_result_for_command(
188+
(
189+
0,
190+
"customresourcedefinition.apiextensions.k8s.io/kueue-crd-1.kueue.x-k8s.io",
191+
),
192+
"kubectl get crd -o name",
193+
)
194+
mock_ask_for_user_consent.return_value = False
195+
196+
result = kueue_manager.install_or_upgrade(KUEUE_CONFIG)
197+
198+
assert result == 1
199+
# Assert there was no command run for the Kueue crd:
200+
mock_commands.assert_command_not_run("kueue-crd-1.kueue.x-k8s.io")
201+
202+
122203
def test_installation_with_tolerations(
123204
mock_commands: CommandsTester, kueue_manager: KueueManager
124205
):

src/xpk/templates/kueue_gke_default_topology.yaml.j2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: kueue.x-k8s.io/v1alpha1
1+
apiVersion: kueue.x-k8s.io/v1beta1
22
kind: Topology
33
metadata:
44
name: "gke-default"

0 commit comments

Comments
 (0)