|
1 | | -$ python3 xpk.py cluster create-pathways --project=golden-project --zone=us-central1-a --enable-autoprovisioning --cluster=golden-cluster --tpu-type=tpu7x-8 --on-demand --dry-run |
2 | | -[XPK] Starting xpk v0.14.3 |
3 | | -[XPK] Starting cluster create for cluster golden-cluster: |
4 | | -[XPK] Working on golden-project and us-central1-a |
5 | | -[XPK] Task: `Determine server supported GKE versions for default rapid gke version` is implemented by the following command not running since it is a dry run. |
6 | | -gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)" |
7 | | -[XPK] Task: `Determine server supported GKE versions for valid versions` is implemented by the following command not running since it is a dry run. |
8 | | -gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.validVersions)" |
9 | | -[XPK] Task: `Find if Cluster Exists` is implemented by the following command not running since it is a dry run. |
10 | | -gcloud container clusters list --project=golden-project --filter=location~"us-central1.*" --format="csv[no-heading](name)" |
11 | | -[XPK] Task: `GKE Cluster Create` is implemented by the following command not running since it is a dry run. |
12 | | -gcloud beta container clusters create golden-cluster --project=golden-project --region=us-central1 --node-locations=us-central1-a --cluster-version=0 --machine-type=e2-standard-16 --enable-autoscaling --total-min-nodes 1 --total-max-nodes 1000 --num-nodes 6 --enable-dns-access --autoscaling-profile=optimize-utilization --labels=gke_product_type=xpk --location-policy=BALANCED --scopes=storage-full,gke-default --enable-ip-alias |
13 | | -[XPK] Task: `Find cluster region or zone` is implemented by the following command not running since it is a dry run. |
14 | | -gcloud container clusters list --project=golden-project --filter=name=golden-cluster --format="value(location)" |
15 | | -[XPK] Task: `Check if Private Nodes is enabled in cluster.` is implemented by the following command not running since it is a dry run. |
16 | | -gcloud container clusters describe golden-cluster --project=golden-project --location=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)" |
17 | | -[XPK] Private Nodes is not enabled on the cluster. |
18 | | -[XPK] Cluster is public and no need to authorize networks. |
19 | | -[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster |
20 | | -[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run. |
21 | | -gcloud container clusters get-credentials golden-cluster --location=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default |
22 | | -[XPK] Testing credentials with kubectl... |
23 | | -[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run. |
24 | | -kubectl get pods |
25 | | -[XPK] Credentials test succeeded. |
26 | | -[XPK] Finished get-credentials and kubectl setup. |
27 | | -[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system |
28 | | -[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run. |
29 | | -kubectl get deployment coredns -n kube-system |
30 | | -[XPK] Now verifying CoreDNS readiness... |
31 | | -[XPK] Task: `Waiting for kubeDNS to be checked.` is implemented by the following command not running since it is a dry run. |
32 | | -kubectl get deployment kube-dns -n kube-system --ignore-not-found |
33 | | -[XPK] kube-dns deployment not found. |
34 | | -[XPK] Verifying if CoreDNS is available... |
35 | | -[XPK] Task: `Wait for coredns available` is implemented by the following command not running since it is a dry run. |
36 | | -kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-system --timeout=240s |
37 | | -[XPK] CoreDNS has successfully started and passed verification. |
38 | | -[XPK] CoreDNS deployment 'coredns' found in namespace 'kube-system'. |
39 | | -[XPK] Skipping CoreDNS deployment since it already exists. |
40 | | -[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run. |
41 | | -gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)" |
42 | | -[XPK] Creating 1 node pool or pools of tpu7x-8 |
43 | | -We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True) |
44 | | -[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run. |
45 | | -gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)" |
46 | | -[XPK] Creating 1 node pool or pools of tpu7x-8 |
47 | | -Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True) |
48 | | -[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run. |
49 | | -gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)" |
50 | | -[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run. |
51 | | -kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="ConfigData:data" --no-headers=true |
52 | | -[XPK] Existing node pool names ['0'] |
53 | | -[XPK] Task: `Retrieve resource policy` is implemented by the following command not running since it is a dry run. |
54 | | -gcloud compute resource-policies describe tpu7x-8-2x2x1-placement-policy --project=golden-project --region=us-central1 |
55 | | -[XPK] To complete NodepoolCreate-golden-cluster-np-0 we are executing gcloud beta container node-pools create golden-cluster-np-0 --location=us-central1 --cluster=golden-cluster --project=golden-project --node-locations=us-central1-a --machine-type=tpu7x-standard-4t --host-maintenance-interval=AS_NEEDED --placement-policy=tpu7x-8-2x2x1-placement-policy --enable-gvnic --node-version=0 --num-nodes=1 --scopes=storage-full,gke-default,"https://www.googleapis.com/auth/cloud-platform" --max-pods-per-node 15 |
56 | | -[XPK] To complete NodepoolCreate-cpu-np we are executing gcloud beta container node-pools create cpu-np --node-version=0 --cluster=golden-cluster --project=golden-project --node-locations=us-central1-a --location=us-central1 --num-nodes=1 --machine-type=n2-standard-64 --scopes=storage-full,gke-default,"https://www.googleapis.com/auth/cloud-platform" --enable-autoscaling --min-nodes=1 --max-nodes=20 |
57 | | -[XPK] Breaking up a total of 2 commands into 1 batches |
58 | | -[XPK] Pretending all the jobs succeeded |
59 | | -[XPK] Create or delete node pool request complete. |
60 | | -[XPK] Enabling Autoprovisioning |
61 | | -[XPK] Default Chips quota is minimum: 0, maximum: 4. |
62 | | -[XPK] Chips quota is minimum: 0, maximum: 4. XPK will autoprovision 4 chips based on incoming workload requests, keeping at least 0 available at all times, and maximum of 4. If the difference (4 chips) is small, rescaling will not work well. |
63 | | -[XPK] Task: `Update cluster with autoprovisioning enabled` is implemented by the following command not running since it is a dry run. |
64 | | -gcloud container clusters update golden-cluster --project=golden-project --location=us-central1 --enable-autoprovisioning --autoprovisioning-config-file 6062bfee91f21efca86f2c3261129f06b1896ad9b68d2ecdba9589bea9e15ddf |
65 | | -[XPK] Task: `Update cluster with autoscaling-profile` is implemented by the following command not running since it is a dry run. |
66 | | -gcloud container clusters update golden-cluster --project=golden-project --location=us-central1 --autoscaling-profile=optimize-utilization |
67 | | -[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run. |
68 | | -gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)" |
69 | | -[XPK] Breaking up a total of 0 commands into 0 batches |
70 | | -[XPK] Pretending all the jobs succeeded |
71 | | -[XPK] Creating ConfigMap for cluster |
72 | | -[XPK] Breaking up a total of 2 commands into 1 batches |
73 | | -[XPK] Pretending all the jobs succeeded |
74 | | -[XPK] Enabling the jobset API on our cluster, to be deprecated when Jobset is globally available |
75 | | -[XPK] Try 1: Install Jobset on golden-cluster |
76 | | -[XPK] Task: `Install Jobset on golden-cluster` is implemented by the following command not running since it is a dry run. |
77 | | -kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml |
78 | | -[XPK] Task: `Count total nodes` is implemented by the following command not running since it is a dry run. |
79 | | -kubectl get node --no-headers | wc -l |
80 | | -[XPK] Try 1: Updating jobset Controller Manager resources |
81 | | -[XPK] Task: `Updating jobset Controller Manager resources` is implemented by the following command not running since it is a dry run. |
82 | | -kubectl apply -f 1b31e624e490f9c8c4ef4e369f08d3fa467990af5a261e4405bd045265d70e95 |
83 | | -[XPK] Try 1: Install PathwaysJob on golden-cluster |
84 | | -[XPK] Task: `Install PathwaysJob on golden-cluster` is implemented by the following command not running since it is a dry run. |
85 | | -kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.4/install.yaml |
86 | | -[XPK] Enabling Kueue on the cluster |
87 | | -[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run. |
88 | | -kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}' |
89 | | -[XPK] Installing Kueue version v0.12.2... |
90 | | -[XPK] Try 1: Install Kueue |
91 | | -[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run. |
92 | | -kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml |
93 | | -[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run. |
94 | | -kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m |
95 | | -[XPK] Applying following Kueue resources: |
96 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
97 | | -kind: ResourceFlavor |
98 | | -metadata: |
99 | | - name: "1xtpu7x-8" |
100 | | -spec: |
101 | | - nodeLabels: {"cloud.google.com/gke-tpu-accelerator": "tpu7x"} |
102 | | - |
103 | | ---- |
104 | | - |
105 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
106 | | -kind: ResourceFlavor |
107 | | -metadata: |
108 | | - name: "cpu-user" |
109 | | -spec: |
110 | | - nodeLabels: {"cloud.google.com/gke-nodepool": "cpu-np"} |
111 | | - |
112 | | ---- |
113 | | - |
114 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
115 | | -kind: AdmissionCheck |
116 | | -metadata: |
117 | | - name: dws-prov |
118 | | -spec: |
119 | | - controllerName: kueue.x-k8s.io/provisioning-request |
120 | | - parameters: |
121 | | - apiGroup: kueue.x-k8s.io |
122 | | - kind: ProvisioningRequestConfig |
123 | | - name: dws-config |
124 | | ---- |
125 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
126 | | -kind: ProvisioningRequestConfig |
127 | | -metadata: |
128 | | - name: dws-config |
129 | | -spec: |
130 | | - provisioningClassName: queued-provisioning.gke.io |
131 | | - podSetUpdates: |
132 | | - nodeSelector: |
133 | | - - key: autoscaling.gke.io/provisioning-request |
134 | | - valueFromProvisioningClassDetail: ResizeRequestName |
135 | | - managedResources: |
136 | | - - google.com/tpu |
137 | | ---- |
138 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
139 | | -kind: ClusterQueue |
140 | | -metadata: |
141 | | - name: "cluster-queue" |
142 | | -spec: |
143 | | - preemption: |
144 | | - reclaimWithinCohort: Never # Don't preempt other queues in the cohort. |
145 | | - withinClusterQueue: LowerPriority |
146 | | - namespaceSelector: {} # match all. |
147 | | - resourceGroups: [{'coveredResources': ['google.com/tpu'], 'flavors': [{'name': '1xtpu7x-8', 'resources': [{'name': 'google.com/tpu', 'nominalQuota': 4}]}]}, {'coveredResources': ['cpu', 'memory'], 'flavors': [{'name': 'cpu-user', 'resources': [{'name': 'cpu', 'nominalQuota': 480}, {'name': 'memory', 'nominalQuota': '2000G'}]}]}] |
148 | | - |
149 | | ---- |
150 | | -apiVersion: kueue.x-k8s.io/v1beta1 |
151 | | -kind: LocalQueue |
152 | | -metadata: |
153 | | - namespace: default |
154 | | - name: multislice-queue |
155 | | -spec: |
156 | | - clusterQueue: cluster-queue |
157 | | ---- |
158 | | -apiVersion: scheduling.k8s.io/v1 |
159 | | -kind: PriorityClass |
160 | | -metadata: |
161 | | - name: very-low |
162 | | -value: 100 |
163 | | -globalDefault: false |
164 | | -description: "Very Low" |
165 | | ---- |
166 | | -apiVersion: scheduling.k8s.io/v1 |
167 | | -kind: PriorityClass |
168 | | -metadata: |
169 | | - name: low |
170 | | -value: 250 |
171 | | -globalDefault: false |
172 | | -description: "Low" |
173 | | ---- |
174 | | -apiVersion: scheduling.k8s.io/v1 |
175 | | -kind: PriorityClass |
176 | | -metadata: |
177 | | - name: medium |
178 | | -value: 500 |
179 | | -globalDefault: false |
180 | | -description: "Medium" |
181 | | ---- |
182 | | -apiVersion: scheduling.k8s.io/v1 |
183 | | -kind: PriorityClass |
184 | | -metadata: |
185 | | - name: high |
186 | | -value: 750 |
187 | | -globalDefault: false |
188 | | -description: "High" |
189 | | ---- |
190 | | -apiVersion: scheduling.k8s.io/v1 |
191 | | -kind: PriorityClass |
192 | | -metadata: |
193 | | - name: very-high |
194 | | -value: 1000 |
195 | | -globalDefault: false |
196 | | -description: "Very High" |
197 | | -[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run. |
198 | | -kubectl apply -f f89effb1f55aef327018037d75f743b5c62d59f1f62fddadaaa31f72e5e07bdf |
199 | | -[XPK] Task: `Count total nodes` is implemented by the following command not running since it is a dry run. |
200 | | -kubectl get node --no-headers | wc -l |
201 | | -[XPK] Try 1: Updating Kueue Controller Manager resources |
202 | | -[XPK] Task: `Updating Kueue Controller Manager resources` is implemented by the following command not running since it is a dry run. |
203 | | -kubectl patch deployment kueue-controller-manager -n kueue-system --type='strategic' --patch='{"spec": {"template": {"spec": {"containers": [{"name": "manager", "resources": {"limits": {"memory": "4096Mi"}}}]}}}}' |
204 | | -[XPK] Verifying kjob installation |
205 | | -[XPK] Task: `Verify kjob installation ` is implemented by the following command not running since it is a dry run. |
206 | | -kubectl-kjob help |
207 | | -[XPK] kjob found |
208 | | -[XPK] Applying kjob CDRs |
209 | | -[XPK] Task: `Create kjob CRDs on cluster` is implemented by the following command not running since it is a dry run. |
210 | | -kubectl kjob printcrds | kubectl apply --server-side -f - |
211 | | -[XPK] Creating kjob CRDs succeeded |
212 | | -[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run. |
213 | | -kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="ConfigData:data" --no-headers=true |
214 | | -[XPK] Task: `Creating JobTemplate` is implemented by the following command not running since it is a dry run. |
215 | | -kubectl apply -f 4abb796ed6e7c9d7256a51f13124efd989fc12ee83839bed432fcf7d64f68e61 |
216 | | -[XPK] Task: `Creating PodTemplate` is implemented by the following command not running since it is a dry run. |
217 | | -kubectl apply -f a63aa3c4593c38ad90671fd8b067d1886f6313ad558379b364b51791aa50f4e8 |
218 | | -[XPK] Task: `Creating AppProfile` is implemented by the following command not running since it is a dry run. |
219 | | -kubectl apply -f 1d13ddebae3c90a05ba26b312df088982dd0df0edc4f4013b88384e476c20486 |
220 | | -[XPK] GKE commands done! Resources are created. |
221 | | -[XPK] See your GKE Cluster here: https://console.cloud.google.com/kubernetes/clusters/details/us-central1/golden-cluster/details?project=golden-project |
222 | | -Traceback (most recent call last): |
223 | | - File "/usr/local/google/home/lidanny/Desktop/Project/diagon_xpk/cienet_xpk/xpk/xpk.py", line 39, in <module> |
224 | | - main() |
225 | | - ~~~~^^ |
226 | | - File "/usr/local/google/home/lidanny/Desktop/Project/diagon_xpk/cienet_xpk/xpk/src/xpk/main.py", line 77, in main |
227 | | - main_args.func(main_args) |
228 | | - ~~~~~~~~~~~~~~^^^^^^^^^^^ |
229 | | - File "/usr/local/google/home/lidanny/Desktop/Project/diagon_xpk/cienet_xpk/xpk/src/xpk/commands/cluster.py", line 765, in cluster_create_pathways |
230 | | - cluster_create(args) |
231 | | - ~~~~~~~~~~~~~~^^^^^^ |
232 | | - File "/usr/local/google/home/lidanny/Desktop/Project/diagon_xpk/cienet_xpk/xpk/src/xpk/commands/cluster.py", line 411, in cluster_create |
233 | | - if args.managed_mldiagnostics: |
234 | | - ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
235 | | -AttributeError: 'Namespace' object has no attribute 'managed_mldiagnostics' |
| 1 | +$ python3 xpk.py cluster create-pathways --project=golden-project --zone=us-central1-a --enable-autoprovisioning --cluster=golden-cluster --tpu-type=tpu7x-8 --on-demand --dry-run --managed-mldiagnostics |
| 2 | +usage: xpk [-h] |
| 3 | + {workload,storage,cluster,inspector,info,batch,job,kind,shell,version,config,run} ... |
| 4 | +xpk: error: unrecognized arguments: --managed-mldiagnostics |
0 commit comments