Skip to content

Commit 3284550

Browse files
authored
AcceleratorType Enum (#779)
* Make AcceleratorType an enum
1 parent d41066f commit 3284550

22 files changed

+108
-111
lines changed

goldens/Basic_cluster_create.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,11 @@ kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-
4040
[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run.
4141
gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)"
4242
[XPK] Creating 1 node pool or pools of tpu7x-8
43-
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
43+
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4444
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
4545
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
4646
[XPK] Creating 1 node pool or pools of tpu7x-8
47-
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
47+
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4848
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
4949
gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)"
5050
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.

goldens/Cluster_create_private.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,13 +42,13 @@ kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-
4242
[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run.
4343
gcloud beta container clusters describe golden-cluster-private --location us-central1 --project golden-project --format="value(currentMasterVersion)"
4444
[XPK] Creating 1 node pool or pools of v5p-8
45-
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=1, device_type='v5p-8', supports_sub_slicing=False, requires_workload_policy=False)
45+
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v5p-8', supports_sub_slicing=False, requires_workload_policy=False)
4646
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
4747
gcloud beta container node-pools list --cluster golden-cluster-private --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
4848
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
4949
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
5050
[XPK] Creating 1 node pool or pools of v5p-8
51-
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=1, device_type='v5p-8', supports_sub_slicing=False, requires_workload_policy=False)
51+
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu-v5p-slice', gce_machine_type='ct5p-hightpu-4t', chips_per_vm=4, accelerator_type=TPU, device_type='v5p-8', supports_sub_slicing=False, requires_workload_policy=False)
5252
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
5353
gcloud beta container node-pools describe 0 --cluster golden-cluster-private --project=golden-project --location=us-central1 --format="value(locations)"
5454
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.

goldens/Cluster_create_with_gb200-4.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,13 @@ kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-
4040
[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run.
4141
gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)"
4242
[XPK] Creating 1 node pool or pools of gb200-4
43-
We assume that the underlying system is: SystemCharacteristics(topology='1x72', vms_per_slice=1, gke_accelerator='nvidia-gb200', gce_machine_type='a4x-highgpu-4g', chips_per_vm=4, accelerator_type=2, device_type='gb200-4', supports_sub_slicing=False, requires_workload_policy=True)
43+
We assume that the underlying system is: SystemCharacteristics(topology='1x72', vms_per_slice=1, gke_accelerator='nvidia-gb200', gce_machine_type='a4x-highgpu-4g', chips_per_vm=4, accelerator_type=GPU, device_type='gb200-4', supports_sub_slicing=False, requires_workload_policy=True)
4444
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
4545
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
4646
[XPK] Task: `Describe reservation` is implemented by the following command not running since it is a dry run.
4747
gcloud beta compute reservations describe golden-reservation --project=golden-project --zone=us-central1-a
4848
[XPK] Creating 1 node pool with 2 nodes of gb200-4
49-
Underlyingly, we assume that means: SystemCharacteristics(topology='1x72', vms_per_slice=1, gke_accelerator='nvidia-gb200', gce_machine_type='a4x-highgpu-4g', chips_per_vm=4, accelerator_type=2, device_type='gb200-4', supports_sub_slicing=False, requires_workload_policy=True)
49+
Underlyingly, we assume that means: SystemCharacteristics(topology='1x72', vms_per_slice=1, gke_accelerator='nvidia-gb200', gce_machine_type='a4x-highgpu-4g', chips_per_vm=4, accelerator_type=GPU, device_type='gb200-4', supports_sub_slicing=False, requires_workload_policy=True)
5050
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
5151
gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)"
5252
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.

goldens/NAP_cluster-create.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,11 @@ kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-
4040
[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run.
4141
gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)"
4242
[XPK] Creating 1 node pool or pools of tpu7x-8
43-
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
43+
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4444
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
4545
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
4646
[XPK] Creating 1 node pool or pools of tpu7x-8
47-
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
47+
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4848
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
4949
gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)"
5050
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.

goldens/NAP_cluster-create_with_pathways.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,11 @@ kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-
4040
[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run.
4141
gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)"
4242
[XPK] Creating 1 node pool or pools of tpu7x-8
43-
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
43+
We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4444
[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run.
4545
gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)"
4646
[XPK] Creating 1 node pool or pools of tpu7x-8
47-
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=1, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
47+
Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True)
4848
[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run.
4949
gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)"
5050
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.

src/xpk/commands/cluster.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def cluster_adapt(args) -> None:
110110
)
111111
add_zone_and_project(args)
112112

113-
if system.accelerator_type == AcceleratorType['GPU'] and not getattr(
113+
if system.accelerator_type == AcceleratorType.GPU and not getattr(
114114
args, 'num_nodes'
115115
):
116116
xpk_print(
@@ -185,7 +185,7 @@ def cluster_adapt(args) -> None:
185185
xpk_exit(install_kueue_code)
186186

187187
install_kjob(args)
188-
if system.accelerator_type == AcceleratorType['GPU']:
188+
if system.accelerator_type == AcceleratorType.GPU:
189189
prepare_gpus(system)
190190

191191
if args.enable_ray_cluster:
@@ -386,7 +386,7 @@ def cluster_create(args) -> None:
386386

387387
install_kjob(args)
388388

389-
if system.accelerator_type == AcceleratorType['GPU']:
389+
if system.accelerator_type == AcceleratorType.GPU:
390390
prepare_gpus(system)
391391

392392
if args.enable_ray_cluster:
@@ -1171,7 +1171,7 @@ def run_gke_cluster_create_command(
11711171
enable_ip_alias = True
11721172
command += ' --enable-master-authorized-networks --enable-private-nodes'
11731173

1174-
if system.accelerator_type == AcceleratorType['GPU']:
1174+
if system.accelerator_type == AcceleratorType.GPU:
11751175
enable_ip_alias = True
11761176
command += (
11771177
' --enable-dataplane-v2'

src/xpk/commands/cluster_gcluster_test.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def test_install_kueue_standard(
9393
gke_accelerator="nvidia-h100-mega-80gb",
9494
gce_machine_type="a3-megagpu-8g",
9595
chips_per_vm=8,
96-
accelerator_type=AcceleratorType["GPU"],
96+
accelerator_type=AcceleratorType.GPU,
9797
device_type="h100-mega-80gb-8",
9898
supports_sub_slicing=False,
9999
)
@@ -140,7 +140,7 @@ def test_install_kueue_with_autoprovisioning(
140140
gke_accelerator="nvidia-h100-mega-80gb",
141141
gce_machine_type="a3-megagpu-8g",
142142
chips_per_vm=8,
143-
accelerator_type=AcceleratorType["GPU"],
143+
accelerator_type=AcceleratorType.GPU,
144144
device_type="h100-mega-80gb-8",
145145
supports_sub_slicing=False,
146146
)

src/xpk/commands/kind.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def cluster_create(args) -> None:
9494
'N/A',
9595
'N/A',
9696
1,
97-
AcceleratorType['CPU'],
97+
AcceleratorType.CPU,
9898
'kind',
9999
supports_sub_slicing=False,
100100
)

src/xpk/commands/workload.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -487,7 +487,7 @@ def workload_create(args) -> None:
487487
values: [{restart_on_exit_codes}]"""
488488

489489
# Create the workload file based on accelerator type or workload type.
490-
if system.accelerator_type == AcceleratorType['GPU']:
490+
if system.accelerator_type == AcceleratorType.GPU:
491491
container, debugging_dashboard_id = get_user_workload_container(
492492
args, system
493493
)
@@ -570,7 +570,7 @@ def workload_create(args) -> None:
570570
container=container,
571571
vms_per_slice=(
572572
compute_vms_per_slice(args.sub_slicing_topology)
573-
if system.accelerator_type == AcceleratorType['TPU']
573+
if system.accelerator_type == AcceleratorType.TPU
574574
and FeatureFlags.SUB_SLICING_ENABLED
575575
and args.sub_slicing_topology is not None
576576
else system.vms_per_slice
@@ -598,7 +598,7 @@ def workload_create(args) -> None:
598598
tpu_toleration="""
599599
- operator: "Exists"
600600
key: google.com/tpu
601-
""" if system.accelerator_type == AcceleratorType['TPU'] else '',
601+
""" if system.accelerator_type == AcceleratorType.TPU else '',
602602
failure_policy_rules=failure_policy_rules,
603603
pod_failure_policy=pod_failure_policy,
604604
)
@@ -615,7 +615,7 @@ def workload_create(args) -> None:
615615

616616
# Get GKE outlier dashboard for TPU
617617
outlier_dashboard_id = None
618-
if system.accelerator_type == AcceleratorType['TPU']:
618+
if system.accelerator_type == AcceleratorType.TPU:
619619
outlier_dashboard_id = get_gke_outlier_dashboard(args)
620620

621621
# Outlier and debugging dashboards

src/xpk/commands/workload_test.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import dataclasses
1818
from unittest.mock import MagicMock, patch
1919
import pytest
20-
from ..core.system_characteristics import SystemCharacteristics
20+
from ..core.system_characteristics import SystemCharacteristics, AcceleratorType
2121
from .workload import _validate_sub_slicing_topology, _validate_sub_slicing_availability
2222
from packaging.version import Version
2323

@@ -28,7 +28,7 @@
2828
gke_accelerator='nvidia-l4',
2929
gce_machine_type='g2-standard-12',
3030
chips_per_vm=1,
31-
accelerator_type=1,
31+
accelerator_type=AcceleratorType.TPU,
3232
device_type='l4-1',
3333
supports_sub_slicing=True,
3434
requires_workload_policy=False,

0 commit comments

Comments
 (0)