|
| 1 | +$ python3 xpk.py cluster create --project=golden-project --zone=us-central1-a --cluster=golden-cluster --tpu-type=tpu7x-8 --spot --cpu-limit=20 --memory-limit=1Gi --dry-run |
| 2 | +[XPK] Starting xpk v0.14.3 |
| 3 | +[XPK] Starting cluster create for cluster golden-cluster: |
| 4 | +[XPK] Working on golden-project and us-central1-a |
| 5 | +[XPK] Task: `Determine server supported GKE versions for default rapid gke version` is implemented by the following command not running since it is a dry run. |
| 6 | +gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)" |
| 7 | +[XPK] Task: `Determine server supported GKE versions for valid versions` is implemented by the following command not running since it is a dry run. |
| 8 | +gcloud container get-server-config --project=golden-project --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.validVersions)" |
| 9 | +[XPK] Task: `Find if Cluster Exists` is implemented by the following command not running since it is a dry run. |
| 10 | +gcloud container clusters list --project=golden-project --filter=location~"us-central1.*" --format="csv[no-heading](name)" |
| 11 | +[XPK] Task: `GKE Cluster Create` is implemented by the following command not running since it is a dry run. |
| 12 | +gcloud beta container clusters create golden-cluster --project=golden-project --region=us-central1 --node-locations=us-central1-a --cluster-version=0 --machine-type=e2-standard-16 --enable-autoscaling --total-min-nodes 1 --total-max-nodes 1000 --num-nodes 6 --enable-dns-access --autoscaling-profile=optimize-utilization --labels=gke_product_type=xpk --location-policy=BALANCED --scopes=storage-full,gke-default |
| 13 | +[XPK] Task: `Find cluster region or zone` is implemented by the following command not running since it is a dry run. |
| 14 | +gcloud container clusters list --project=golden-project --filter=name=golden-cluster --format="value(location)" |
| 15 | +[XPK] Task: `Check if Private Nodes is enabled in cluster.` is implemented by the following command not running since it is a dry run. |
| 16 | +gcloud container clusters describe golden-cluster --project=golden-project --location=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)" |
| 17 | +[XPK] Private Nodes is not enabled on the cluster. |
| 18 | +[XPK] Cluster is public and no need to authorize networks. |
| 19 | +[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster |
| 20 | +[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run. |
| 21 | +gcloud container clusters get-credentials golden-cluster --location=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default |
| 22 | +[XPK] Testing credentials with kubectl... |
| 23 | +[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run. |
| 24 | +kubectl get pods |
| 25 | +[XPK] Credentials test succeeded. |
| 26 | +[XPK] Finished get-credentials and kubectl setup. |
| 27 | +[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system |
| 28 | +[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run. |
| 29 | +kubectl get deployment coredns -n kube-system |
| 30 | +[XPK] Now verifying CoreDNS readiness... |
| 31 | +[XPK] Task: `Waiting for kubeDNS to be checked.` is implemented by the following command not running since it is a dry run. |
| 32 | +kubectl get deployment kube-dns -n kube-system --ignore-not-found |
| 33 | +[XPK] kube-dns deployment not found. |
| 34 | +[XPK] Verifying if CoreDNS is available... |
| 35 | +[XPK] Task: `Wait for coredns available` is implemented by the following command not running since it is a dry run. |
| 36 | +kubectl wait deployment/coredns --for=condition=Available=true --namespace=kube-system --timeout=240s |
| 37 | +[XPK] CoreDNS has successfully started and passed verification. |
| 38 | +[XPK] CoreDNS deployment 'coredns' found in namespace 'kube-system'. |
| 39 | +[XPK] Skipping CoreDNS deployment since it already exists. |
| 40 | +[XPK] Task: `Determine current gke master version` is implemented by the following command not running since it is a dry run. |
| 41 | +gcloud beta container clusters describe golden-cluster --location us-central1 --project golden-project --format="value(currentMasterVersion)" |
| 42 | +[XPK] Creating 1 node pool or pools of tpu7x-8 |
| 43 | +We assume that the underlying system is: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True) |
| 44 | +[XPK] Task: `Get All Node Pools` is implemented by the following command not running since it is a dry run. |
| 45 | +gcloud beta container node-pools list --cluster golden-cluster --project=golden-project --location=us-central1 --format="csv[no-heading](name)" |
| 46 | +[XPK] Creating 1 node pool or pools of tpu7x-8 |
| 47 | +Underlyingly, we assume that means: SystemCharacteristics(topology='2x2x1', vms_per_slice=1, gke_accelerator='tpu7x', gce_machine_type='tpu7x-standard-4t', chips_per_vm=4, accelerator_type=TPU, device_type='tpu7x-8', supports_sub_slicing=False, requires_workload_policy=True) |
| 48 | +[XPK] Task: `Get Node Pool Zone` is implemented by the following command not running since it is a dry run. |
| 49 | +gcloud beta container node-pools describe 0 --cluster golden-cluster --project=golden-project --location=us-central1 --format="value(locations)" |
| 50 | +[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run. |
| 51 | +kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="ConfigData:data" --no-headers=true |
| 52 | +[XPK] Existing node pool names ['0'] |
| 53 | +[XPK] Task: `Retrieve resource policy` is implemented by the following command not running since it is a dry run. |
| 54 | +gcloud compute resource-policies describe tpu7x-8-2x2x1-placement-policy --project=golden-project --region=us-central1 |
| 55 | +[XPK] To complete NodepoolCreate-golden-cluster-np-0 we are executing gcloud beta container node-pools create golden-cluster-np-0 --location=us-central1 --cluster=golden-cluster --project=golden-project --node-locations=us-central1-a --machine-type=tpu7x-standard-4t --host-maintenance-interval=AS_NEEDED --spot --placement-policy=tpu7x-8-2x2x1-placement-policy --enable-gvnic --node-version=0 --num-nodes=1 --scopes=storage-full,gke-default,"https://www.googleapis.com/auth/cloud-platform" --max-pods-per-node 15 |
| 56 | +[XPK] Breaking up a total of 1 commands into 1 batches |
| 57 | +[XPK] Pretending all the jobs succeeded |
| 58 | +[XPK] Create or delete node pool request complete. |
| 59 | +[XPK] Creating ConfigMap for cluster |
| 60 | +[XPK] Breaking up a total of 2 commands into 1 batches |
| 61 | +[XPK] Pretending all the jobs succeeded |
| 62 | +[XPK] Enabling the jobset API on our cluster, to be deprecated when Jobset is globally available |
| 63 | +[XPK] Try 1: Install Jobset on golden-cluster |
| 64 | +[XPK] Task: `Install Jobset on golden-cluster` is implemented by the following command not running since it is a dry run. |
| 65 | +kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml |
| 66 | +[XPK] Task: `Count total nodes` is implemented by the following command not running since it is a dry run. |
| 67 | +kubectl get node --no-headers | wc -l |
| 68 | +[XPK] Try 1: Updating jobset Controller Manager resources |
| 69 | +[XPK] Task: `Updating jobset Controller Manager resources` is implemented by the following command not running since it is a dry run. |
| 70 | +kubectl apply -f 1b31e624e490f9c8c4ef4e369f08d3fa467990af5a261e4405bd045265d70e95 |
| 71 | +[XPK] Try 1: Install PathwaysJob on golden-cluster |
| 72 | +[XPK] Task: `Install PathwaysJob on golden-cluster` is implemented by the following command not running since it is a dry run. |
| 73 | +kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.4/install.yaml |
| 74 | +[XPK] Enabling Kueue on the cluster |
| 75 | +[XPK] Task: `Get kueue version on server` is implemented by the following command not running since it is a dry run. |
| 76 | +kubectl get deployment kueue-controller-manager -n kueue-system -o jsonpath='{.spec.template.spec.containers[0].image}' |
| 77 | +[XPK] Installing Kueue version v0.14.3... |
| 78 | +[XPK] Try 1: Install Kueue |
| 79 | +[XPK] Task: `Install Kueue` is implemented by the following command not running since it is a dry run. |
| 80 | +kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.14.3/manifests.yaml |
| 81 | +[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run. |
| 82 | +kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m |
| 83 | +[XPK] Task: `Get vCPU and memory capacity for machine type` is implemented by the following command not running since it is a dry run. |
| 84 | +gcloud compute machine-types describe tpu7x-standard-4t --project=golden-project --zone=us-central1-a --format='value(guestCpus,memoryMb)' |
| 85 | +[XPK] The CPU limit is above the available capacity. We will set CPU limit to 10. |
| 86 | +[XPK] The memory limit is above the available capacity. We will set memory limit to 10Mi. |
| 87 | +[XPK] Applying following Kueue resources: |
| 88 | +apiVersion: kueue.x-k8s.io/v1beta1 |
| 89 | +kind: ResourceFlavor |
| 90 | +metadata: |
| 91 | + name: "1xtpu7x-8" |
| 92 | +spec: |
| 93 | + nodeLabels: {"cloud.google.com/gke-tpu-accelerator": "tpu7x", "cloud.google.com/gke-tpu-topology": "2x2x1"} |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +apiVersion: kueue.x-k8s.io/v1beta1 |
| 98 | +kind: AdmissionCheck |
| 99 | +metadata: |
| 100 | + name: dws-prov |
| 101 | +spec: |
| 102 | + controllerName: kueue.x-k8s.io/provisioning-request |
| 103 | + parameters: |
| 104 | + apiGroup: kueue.x-k8s.io |
| 105 | + kind: ProvisioningRequestConfig |
| 106 | + name: dws-config |
| 107 | +--- |
| 108 | +apiVersion: kueue.x-k8s.io/v1beta1 |
| 109 | +kind: ProvisioningRequestConfig |
| 110 | +metadata: |
| 111 | + name: dws-config |
| 112 | +spec: |
| 113 | + provisioningClassName: queued-provisioning.gke.io |
| 114 | + podSetUpdates: |
| 115 | + nodeSelector: |
| 116 | + - key: autoscaling.gke.io/provisioning-request |
| 117 | + valueFromProvisioningClassDetail: ResizeRequestName |
| 118 | + managedResources: |
| 119 | + - google.com/tpu |
| 120 | +--- |
| 121 | +apiVersion: kueue.x-k8s.io/v1beta1 |
| 122 | +kind: ClusterQueue |
| 123 | +metadata: |
| 124 | + name: "cluster-queue" |
| 125 | +spec: |
| 126 | + preemption: |
| 127 | + reclaimWithinCohort: Never # Don't preempt other queues in the cohort. |
| 128 | + withinClusterQueue: LowerPriority |
| 129 | + namespaceSelector: {} # match all. |
| 130 | + resourceGroups: [{'coveredResources': ['google.com/tpu', 'cpu', 'memory'], 'flavors': [{'name': '1xtpu7x-8', 'resources': [{'name': 'google.com/tpu', 'nominalQuota': 4}, {'name': 'cpu', 'nominalQuota': 10}, {'name': 'memory', 'nominalQuota': '10Mi'}]}]}] |
| 131 | + |
| 132 | +--- |
| 133 | +apiVersion: kueue.x-k8s.io/v1beta1 |
| 134 | +kind: LocalQueue |
| 135 | +metadata: |
| 136 | + namespace: default |
| 137 | + name: multislice-queue |
| 138 | +spec: |
| 139 | + clusterQueue: cluster-queue |
| 140 | +--- |
| 141 | +apiVersion: scheduling.k8s.io/v1 |
| 142 | +kind: PriorityClass |
| 143 | +metadata: |
| 144 | + name: very-low |
| 145 | +value: 100 |
| 146 | +globalDefault: false |
| 147 | +description: "Very Low" |
| 148 | +--- |
| 149 | +apiVersion: scheduling.k8s.io/v1 |
| 150 | +kind: PriorityClass |
| 151 | +metadata: |
| 152 | + name: low |
| 153 | +value: 250 |
| 154 | +globalDefault: false |
| 155 | +description: "Low" |
| 156 | +--- |
| 157 | +apiVersion: scheduling.k8s.io/v1 |
| 158 | +kind: PriorityClass |
| 159 | +metadata: |
| 160 | + name: medium |
| 161 | +value: 500 |
| 162 | +globalDefault: false |
| 163 | +description: "Medium" |
| 164 | +--- |
| 165 | +apiVersion: scheduling.k8s.io/v1 |
| 166 | +kind: PriorityClass |
| 167 | +metadata: |
| 168 | + name: high |
| 169 | +value: 750 |
| 170 | +globalDefault: false |
| 171 | +description: "High" |
| 172 | +--- |
| 173 | +apiVersion: scheduling.k8s.io/v1 |
| 174 | +kind: PriorityClass |
| 175 | +metadata: |
| 176 | + name: very-high |
| 177 | +value: 1000 |
| 178 | +globalDefault: false |
| 179 | +description: "Very High" |
| 180 | +[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run. |
| 181 | +kubectl apply -f 1ea1a0b1a0ec540d8320ef2a8378363e692a8439192a8f50c4b77fe545dd0a4c |
| 182 | +[XPK] Task: `Count total nodes` is implemented by the following command not running since it is a dry run. |
| 183 | +kubectl get node --no-headers | wc -l |
| 184 | +[XPK] Try 1: Updating Kueue Controller Manager resources |
| 185 | +[XPK] Task: `Updating Kueue Controller Manager resources` is implemented by the following command not running since it is a dry run. |
| 186 | +kubectl patch deployment kueue-controller-manager -n kueue-system --type='strategic' --patch='{"spec": {"template": {"spec": {"containers": [{"name": "manager", "resources": {"limits": {"memory": "4096Mi"}}}]}}}}' |
| 187 | +[XPK] Verifying kjob installation |
| 188 | +[XPK] Task: `Verify kjob installation ` is implemented by the following command not running since it is a dry run. |
| 189 | +kubectl-kjob help |
| 190 | +[XPK] kjob found |
| 191 | +[XPK] Applying kjob CDRs |
| 192 | +[XPK] Task: `Create kjob CRDs on cluster` is implemented by the following command not running since it is a dry run. |
| 193 | +kubectl kjob printcrds | kubectl apply --server-side -f - |
| 194 | +[XPK] Creating kjob CRDs succeeded |
| 195 | +[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run. |
| 196 | +kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="ConfigData:data" --no-headers=true |
| 197 | +[XPK] Task: `Creating JobTemplate` is implemented by the following command not running since it is a dry run. |
| 198 | +kubectl apply -f 4abb796ed6e7c9d7256a51f13124efd989fc12ee83839bed432fcf7d64f68e61 |
| 199 | +[XPK] Task: `Creating PodTemplate` is implemented by the following command not running since it is a dry run. |
| 200 | +kubectl apply -f a63aa3c4593c38ad90671fd8b067d1886f6313ad558379b364b51791aa50f4e8 |
| 201 | +[XPK] Task: `Creating AppProfile` is implemented by the following command not running since it is a dry run. |
| 202 | +kubectl apply -f 1d13ddebae3c90a05ba26b312df088982dd0df0edc4f4013b88384e476c20486 |
| 203 | +[XPK] GKE commands done! Resources are created. |
| 204 | +[XPK] See your GKE Cluster here: https://console.cloud.google.com/kubernetes/clusters/details/us-central1/golden-cluster/details?project=golden-project |
| 205 | +[XPK] Exiting XPK cleanly |
0 commit comments