|
| 1 | +# Inference Gateway: Migrating from v1alpha2 to v1 API |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API. |
| 6 | +This document is intended for platform administrators and networking specialists |
| 7 | +who are currently using the `v1alpha2` version of the Inference Gateway and |
| 8 | +want to upgrade to the `v1` version to leverage the latest features and improvements. |
| 9 | + |
| 10 | +Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway. |
| 11 | + |
| 12 | +*** |
| 13 | + |
| 14 | +## Before you begin |
| 15 | + |
| 16 | +Before starting the migration, it's important to determine if this guide is necessary for your setup. |
| 17 | + |
| 18 | +### Checking for Existing v1alpha2 APIs |
| 19 | + |
| 20 | +To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command: |
| 21 | + |
| 22 | +```bash |
| 23 | +kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces |
| 24 | +``` |
| 25 | + |
| 26 | +* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide. |
| 27 | +* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway. |
| 28 | + |
| 29 | +*** |
| 30 | + |
| 31 | +## Migration Paths |
| 32 | + |
| 33 | +There are two paths for migrating from `v1alpha2` to `v1`: |
| 34 | + |
| 35 | +1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions. |
| 36 | +2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic. |
| 37 | + |
| 38 | +*** |
| 39 | + |
| 40 | +## Simple Migration (with downtime) |
| 41 | + |
| 42 | +This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration. |
| 43 | + |
| 44 | +### 1. Delete Existing v1alpha2 Resources |
| 45 | + |
| 46 | +**Option a: Uninstall using Helm.** |
| 47 | + |
| 48 | +```bash |
| 49 | +helm uninstall <helm_preview_inferencepool_name> |
| 50 | +``` |
| 51 | + |
| 52 | +**Option b: Manually delete preview `InferencePool` resources.** |
| 53 | + |
| 54 | +If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves. |
| 55 | + |
| 56 | +1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`. |
| 57 | +2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service. |
| 58 | +3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster. |
| 59 | + ```bash |
| 60 | + kubectl delete -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml) |
| 61 | + ``` |
| 62 | + |
| 63 | +### 2. Install v1 Resources |
| 64 | + |
| 65 | +After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway. This involves installing the new `v1` CRDs, creating a new `v1` `InferencePool` and corresponding `InferenceObjective` resources, and creating a new `HTTPRoute` that directs traffic to your new `v1` `InferencePool`. |
| 66 | + |
| 67 | + |
| 68 | +### 3. Verify the Deployment |
| 69 | + |
| 70 | +After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway. |
| 71 | + |
| 72 | +```bash |
| 73 | +❯ kubectl get gateway -o wide |
| 74 | +NAME CLASS ADDRESS PROGRAMMED AGE |
| 75 | +inference-gateway inference-gateway <IP_ADDRESS> True 10m |
| 76 | +``` |
| 77 | + |
| 78 | +Curl the endpoint to make sure you are getting a successful response with a **200** response code. |
| 79 | + |
| 80 | +```bash |
| 81 | +IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') |
| 82 | +PORT=80 |
| 83 | + |
| 84 | +curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ |
| 85 | +"model": "<your_model>", |
| 86 | +"prompt": "<your_prompt>", |
| 87 | +"max_tokens": 100, |
| 88 | +"temperature": 0 |
| 89 | +}' |
| 90 | +``` |
| 91 | + |
| 92 | +*** |
| 93 | + |
| 94 | +## Zero-Downtime Migration |
| 95 | + |
| 96 | +This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram |
| 97 | + |
| 98 | +<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" /> |
| 99 | + |
| 100 | +### A Note on Interacting with Multiple API Versions |
| 101 | + |
| 102 | +During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name: |
| 103 | + |
| 104 | +* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io` |
| 105 | +* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io` |
| 106 | + |
| 107 | +The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically: |
| 108 | + |
| 109 | +```bash |
| 110 | +kubectl get infpool |
| 111 | +``` |
| 112 | + |
| 113 | +This guide will use these full names or the short name for `v1` to avoid ambiguity. |
| 114 | + |
| 115 | +*** |
| 116 | + |
| 117 | +### Stage 1: Side-by-side v1 Deployment |
| 118 | + |
| 119 | +In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration. |
| 120 | + |
| 121 | +After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram |
| 122 | + |
| 123 | +<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" /> |
| 124 | + |
| 125 | +**1. Install v1 CRDs** |
| 126 | + |
| 127 | +```bash |
| 128 | +RELEASE=v1.0.0 |
| 129 | +kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml) |
| 130 | +``` |
| 131 | + |
| 132 | +**2. Install the v1 `InferencePool`** |
| 133 | + |
| 134 | +Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`). |
| 135 | + |
| 136 | +```bash |
| 137 | +helm install vllm-llama3-8b-instruct-ga \ |
| 138 | + --set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \ |
| 139 | + --set provider.name=<YOUR_PROVIDER> \ |
| 140 | + --version $RELEASE \ |
| 141 | + oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool |
| 142 | +``` |
| 143 | + |
| 144 | +**3. Create the v1 `InferenceObjective`** |
| 145 | + |
| 146 | +The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`. |
| 147 | + |
| 148 | +```yaml |
| 149 | +kubectl apply -f - <<EOF |
| 150 | +--- |
| 151 | +apiVersion: inference.networking.x-k8s.io/v1alpha2 |
| 152 | +kind: InferenceObjective |
| 153 | +metadata: |
| 154 | + name: food-review |
| 155 | +spec: |
| 156 | + priority: 1 |
| 157 | + poolRef: |
| 158 | + group: inference.networking.k8s.io |
| 159 | + name: vllm-llama3-8b-instruct-ga |
| 160 | +--- |
| 161 | +apiVersion: inference.networking.x-k8s.io/v1alpha2 |
| 162 | +kind: InferenceObjective |
| 163 | +metadata: |
| 164 | + name: base-model |
| 165 | +spec: |
| 166 | + priority: 2 |
| 167 | + poolRef: |
| 168 | + group: inference.networking.k8s.io |
| 169 | + name: vllm-llama3-8b-instruct-ga |
| 170 | +--- |
| 171 | +EOF |
| 172 | +``` |
| 173 | + |
| 174 | +*** |
| 175 | + |
| 176 | +### Stage 2: Traffic Shifting |
| 177 | + |
| 178 | +With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split. |
| 179 | + |
| 180 | +**1. Update `HTTPRoute` for Traffic Splitting** |
| 181 | + |
| 182 | +```yaml |
| 183 | +kubectl apply -f - <<EOF |
| 184 | +--- |
| 185 | +apiVersion: gateway.networking.k8s.io/v1 |
| 186 | +kind: HTTPRoute |
| 187 | +metadata: |
| 188 | + name: llm-route |
| 189 | +spec: |
| 190 | + parentRefs: |
| 191 | + - group: gateway.networking.k8s.io |
| 192 | + kind: Gateway |
| 193 | + name: inference-gateway |
| 194 | + rules: |
| 195 | + - backendRefs: |
| 196 | + - group: inference.networking.x-k8s.io |
| 197 | + kind: InferencePool |
| 198 | + name: vllm-llama3-8b-instruct-preview |
| 199 | + weight: 50 |
| 200 | + - group: inference.networking.k8s.io |
| 201 | + kind: InferencePool |
| 202 | + name: vllm-llama3-8b-instruct-ga |
| 203 | + weight: 50 |
| 204 | +--- |
| 205 | +EOF |
| 206 | +``` |
| 207 | + |
| 208 | +**2. Verify and Monitor** |
| 209 | + |
| 210 | +After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`. |
| 211 | + |
| 212 | +*** |
| 213 | + |
| 214 | +### Stage 3: Finalization and Cleanup |
| 215 | + |
| 216 | +Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources. |
| 217 | + |
| 218 | +**1. Shift 100% of Traffic to the v1 `InferencePool`** |
| 219 | + |
| 220 | +Update the `HTTPRoute` to send all traffic to the `v1` pool. |
| 221 | + |
| 222 | +```yaml |
| 223 | +kubectl apply -f - <<EOF |
| 224 | +apiVersion: gateway.networking.k8s.io/v1 |
| 225 | +kind: HTTPRoute |
| 226 | +metadata: |
| 227 | + name: llm-route |
| 228 | +spec: |
| 229 | + parentRefs: |
| 230 | + - group: gateway.networking.k8s.io |
| 231 | + kind: Gateway |
| 232 | + name: inference-gateway |
| 233 | + rules: |
| 234 | + - backendRefs: |
| 235 | + - group: inference.networking.k8s.io |
| 236 | + kind: InferencePool |
| 237 | + name: vllm-llama3-8b-instruct-ga |
| 238 | + weight: 100 |
| 239 | +EOF |
| 240 | +``` |
| 241 | + |
| 242 | +**2. Final Verification** |
| 243 | + |
| 244 | +Send test requests to ensure your `v1` stack is handling all traffic as expected. |
| 245 | + |
| 246 | +<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" /> |
| 247 | + |
| 248 | +You should have a **`PROGRAMMED`** gateway: |
| 249 | +```bash |
| 250 | +❯ kubectl get gateway -o wide |
| 251 | +NAME CLASS ADDRESS PROGRAMMED AGE |
| 252 | +inference-gateway inference-gateway <IP_ADDRESS> True 10m |
| 253 | +``` |
| 254 | + |
| 255 | +Curl the endpoint and verify a **200** response code: |
| 256 | +```bash |
| 257 | +IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') |
| 258 | +PORT=80 |
| 259 | + |
| 260 | +curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ |
| 261 | +"model": "<your_model>", |
| 262 | +"prompt": "<your_prompt>", |
| 263 | +"max_tokens": 100, |
| 264 | +"temperature": 0 |
| 265 | +}' |
| 266 | +``` |
| 267 | + |
| 268 | +**3. Clean Up v1alpha2 Resources** |
| 269 | + |
| 270 | +After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources. |
0 commit comments