|
| 1 | +# GKE + CAST AI GitOps example — umbrella Helm chart |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This example demonstrates a **GitOps onboarding flow** using the CAST AI umbrella Helm chart (`castai-helm/castai`). |
| 6 | +The umbrella chart replaces individual per-component charts and lets you switch between operating modes with a single `helm upgrade` command. |
| 7 | + |
| 8 | +### When is Terraform needed? |
| 9 | + |
| 10 | +| Mode | Terraform required? | What Terraform does | |
| 11 | +|---|---|---| |
| 12 | +| **Read-only** | No | — | |
| 13 | +| **Workload Autoscaler** | No | — | |
| 14 | +| **Node Autoscaler / Full** | **Yes** | Creates GCP service account with IAM permissions needed for node provisioning | |
| 15 | + |
| 16 | +> For read-only and workload autoscaler modes you only need a CAST AI API key and Helm. Start there and add Terraform later only if you want node autoscaling. |
| 17 | +
|
| 18 | +--- |
| 19 | + |
| 20 | +## Umbrella chart modes |
| 21 | + |
| 22 | +The umbrella chart uses **tags** to control which sub-charts are installed. |
| 23 | + |
| 24 | +| Tag | Installed components | Use-case | |
| 25 | +|---|---|---| |
| 26 | +| `tags.readonly=true` | agent, spot-handler, kvisor, gpu-metrics-exporter | Observe the cluster — no changes made to workloads or nodes | |
| 27 | +| `tags.workload-autoscaler=true` | above + cluster-controller, evictor, pod-mutator, workload-autoscaler, workload-autoscaler-exporter | Right-size workload CPU/memory requests automatically | |
| 28 | +| `tags.full=true` | all components incl. pod-pinner, live | Full node autoscaler + workload autoscaler | |
| 29 | + |
| 30 | +> Only one tag should be `true` at a time. When upgrading modes use `--reset-then-reuse-values` and flip the tags (see examples below). |
| 31 | +
|
| 32 | +--- |
| 33 | + |
| 34 | +## Prerequisites |
| 35 | + |
| 36 | +- CAST AI account |
| 37 | +- CAST AI **organization member API key** from [console.cast.ai → Service Accounts](https://console.cast.ai/organization/management/access-control/service-accounts) |
| 38 | +- `castai-helm` Helm repo: |
| 39 | + ```sh |
| 40 | + helm repo add castai-helm https://castai.github.io/helm-charts |
| 41 | + helm repo update |
| 42 | + ``` |
| 43 | + |
| 44 | +--- |
| 45 | + |
| 46 | +## Step 1 — Install in read-only mode (Helm only) |
| 47 | + |
| 48 | +No Terraform needed. The API key here is the CAST AI **member** key (not a full-access key). |
| 49 | + |
| 50 | +```sh |
| 51 | +helm upgrade -i castai castai-helm/castai -n castai-agent --create-namespace \ |
| 52 | + --set global.castai.apiKey="<your-castai-api-key>" \ |
| 53 | + --set global.castai.provider="gke" \ |
| 54 | + --set tags.readonly=true |
| 55 | +``` |
| 56 | + |
| 57 | +After the pods become ready your cluster appears as **Read only** in the CAST AI console. |
| 58 | +CAST AI can now observe the cluster — no changes are made to your workloads or nodes. |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## Step 2 (optional) — Upgrade to Workload Autoscaler (Helm only) |
| 63 | + |
| 64 | +When you are ready to let CAST AI right-size CPU/memory requests for your workloads, upgrade the release. |
| 65 | +**No Terraform changes required.** |
| 66 | + |
| 67 | +`--reset-then-reuse-values` keeps all previously set values and only applies the overrides you specify. |
| 68 | + |
| 69 | +```sh |
| 70 | +helm upgrade castai castai-helm/castai -n castai-agent \ |
| 71 | + --reset-then-reuse-values \ |
| 72 | + --set tags.readonly=false \ |
| 73 | + --set tags.workload-autoscaler=true |
| 74 | +``` |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## Step 3 (optional) — Upgrade to Full mode / Node Autoscaler (Terraform + Helm) |
| 79 | + |
| 80 | +Full mode enables node provisioning, bin-packing, spot instance handling, eviction, and pod pinning. |
| 81 | +This requires a **GCP service account** with the correct IAM permissions — Terraform creates it. |
| 82 | + |
| 83 | +### 3a. Run Terraform |
| 84 | + |
| 85 | +Fill in your values: |
| 86 | + |
| 87 | +```sh |
| 88 | +cp tf.vars.example terraform.tfvars |
| 89 | +# edit terraform.tfvars |
| 90 | +``` |
| 91 | + |
| 92 | +Apply: |
| 93 | + |
| 94 | +```sh |
| 95 | +terraform init |
| 96 | +terraform apply |
| 97 | +``` |
| 98 | + |
| 99 | +This registers the cluster with CAST AI and creates the GCP service account. |
| 100 | + |
| 101 | +Capture the outputs — you'll need them to configure the Helm release: |
| 102 | + |
| 103 | +```sh |
| 104 | +terraform output cluster_id |
| 105 | +terraform output -raw cluster_token |
| 106 | +``` |
| 107 | + |
| 108 | +> `cluster_token` expires after a few hours if no CAST AI component connects. Run the Helm upgrade promptly after this step. |
| 109 | +
|
| 110 | +### 3b. Upgrade the Helm release |
| 111 | + |
| 112 | +If you were already running read-only or workload-autoscaler mode, upgrade using `--reset-then-reuse-values`. |
| 113 | +If this is a fresh install, use `helm upgrade -i` and pass `cluster_token` and `cluster_id` from step 3a. |
| 114 | + |
| 115 | +```sh |
| 116 | +helm upgrade castai castai-helm/castai -n castai-agent \ |
| 117 | + --reset-then-reuse-values \ |
| 118 | + --set tags.readonly=false \ |
| 119 | + --set tags.workload-autoscaler=false \ |
| 120 | + --set tags.full=true |
| 121 | +``` |
0 commit comments