Skip to content

Commit 9556ab7

Browse files
committed
added migration guide
1 parent bd4bf22 commit 9556ab7

File tree

4 files changed

+270
-0
lines changed

4 files changed

+270
-0
lines changed

site-src/guides/ga-migration.guide

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
# Inference Gateway: Migrating from v1alpha2 to v1 API
2+
3+
## Introduction
4+
5+
This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API.
6+
This document is intended for platform administrators and networking specialists
7+
who are currently using the `v1alpha2` version of the Inference Gateway and
8+
want to upgrade to the `v1` version to leverage the latest features and improvements.
9+
10+
Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway.
11+
12+
***
13+
14+
## Before you begin
15+
16+
Before starting the migration, it's important to determine if this guide is necessary for your setup.
17+
18+
### Checking for Existing v1alpha2 APIs
19+
20+
To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command:
21+
22+
```bash
23+
kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces
24+
```
25+
26+
* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide.
27+
* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway.
28+
29+
***
30+
31+
## Migration Paths
32+
33+
There are two paths for migrating from `v1alpha2` to `v1`:
34+
35+
1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions.
36+
2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic.
37+
38+
***
39+
40+
## Simple Migration (with downtime)
41+
42+
This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration.
43+
44+
### 1. Delete Existing v1alpha2 Resources
45+
46+
**Option a: Uninstall using Helm.**
47+
48+
```bash
49+
helm uninstall <helm_preview_inferencepool_name>
50+
```
51+
52+
**Option b: Manually delete preview `InferencePool` resources.**
53+
54+
If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves.
55+
56+
1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`.
57+
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.
58+
3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster.
59+
```bash
60+
kubectl delete -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml)
61+
```
62+
63+
### 2. Install v1 Resources
64+
65+
After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway. This involves installing the new `v1` CRDs, creating a new `v1` `InferencePool` and corresponding `InferenceObjective` resources, and creating a new `HTTPRoute` that directs traffic to your new `v1` `InferencePool`.
66+
67+
68+
### 3. Verify the Deployment
69+
70+
After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway.
71+
72+
```bash
73+
❯ kubectl get gateway -o wide
74+
NAME CLASS ADDRESS PROGRAMMED AGE
75+
inference-gateway inference-gateway <IP_ADDRESS> True 10m
76+
```
77+
78+
Curl the endpoint to make sure you are getting a successful response with a **200** response code.
79+
80+
```bash
81+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
82+
PORT=80
83+
84+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
85+
"model": "<your_model>",
86+
"prompt": "<your_prompt>",
87+
"max_tokens": 100,
88+
"temperature": 0
89+
}'
90+
```
91+
92+
***
93+
94+
## Zero-Downtime Migration
95+
96+
This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram
97+
98+
<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" />
99+
100+
### A Note on Interacting with Multiple API Versions
101+
102+
During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name:
103+
104+
* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io`
105+
* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io`
106+
107+
The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically:
108+
109+
```bash
110+
kubectl get infpool
111+
```
112+
113+
This guide will use these full names or the short name for `v1` to avoid ambiguity.
114+
115+
***
116+
117+
### Stage 1: Side-by-side v1 Deployment
118+
119+
In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration.
120+
121+
After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram
122+
123+
<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" />
124+
125+
**1. Install v1 CRDs**
126+
127+
```bash
128+
RELEASE=v1.0.0
129+
kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml)
130+
```
131+
132+
**2. Install the v1 `InferencePool`**
133+
134+
Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`).
135+
136+
```bash
137+
helm install vllm-llama3-8b-instruct-ga \
138+
--set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \
139+
--set provider.name=<YOUR_PROVIDER> \
140+
--version $RELEASE \
141+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
142+
```
143+
144+
**3. Create the v1 `InferenceObjective`**
145+
146+
The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`.
147+
148+
```yaml
149+
kubectl apply -f - <<EOF
150+
---
151+
apiVersion: inference.networking.x-k8s.io/v1alpha2
152+
kind: InferenceObjective
153+
metadata:
154+
name: food-review
155+
spec:
156+
priority: 1
157+
poolRef:
158+
group: inference.networking.k8s.io
159+
name: vllm-llama3-8b-instruct-ga
160+
---
161+
apiVersion: inference.networking.x-k8s.io/v1alpha2
162+
kind: InferenceObjective
163+
metadata:
164+
name: base-model
165+
spec:
166+
priority: 2
167+
poolRef:
168+
group: inference.networking.k8s.io
169+
name: vllm-llama3-8b-instruct-ga
170+
---
171+
EOF
172+
```
173+
174+
***
175+
176+
### Stage 2: Traffic Shifting
177+
178+
With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split.
179+
180+
**1. Update `HTTPRoute` for Traffic Splitting**
181+
182+
```yaml
183+
kubectl apply -f - <<EOF
184+
---
185+
apiVersion: gateway.networking.k8s.io/v1
186+
kind: HTTPRoute
187+
metadata:
188+
name: llm-route
189+
spec:
190+
parentRefs:
191+
- group: gateway.networking.k8s.io
192+
kind: Gateway
193+
name: inference-gateway
194+
rules:
195+
- backendRefs:
196+
- group: inference.networking.x-k8s.io
197+
kind: InferencePool
198+
name: vllm-llama3-8b-instruct-preview
199+
weight: 50
200+
- group: inference.networking.k8s.io
201+
kind: InferencePool
202+
name: vllm-llama3-8b-instruct-ga
203+
weight: 50
204+
---
205+
EOF
206+
```
207+
208+
**2. Verify and Monitor**
209+
210+
After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`.
211+
212+
***
213+
214+
### Stage 3: Finalization and Cleanup
215+
216+
Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources.
217+
218+
**1. Shift 100% of Traffic to the v1 `InferencePool`**
219+
220+
Update the `HTTPRoute` to send all traffic to the `v1` pool.
221+
222+
```yaml
223+
kubectl apply -f - <<EOF
224+
apiVersion: gateway.networking.k8s.io/v1
225+
kind: HTTPRoute
226+
metadata:
227+
name: llm-route
228+
spec:
229+
parentRefs:
230+
- group: gateway.networking.k8s.io
231+
kind: Gateway
232+
name: inference-gateway
233+
rules:
234+
- backendRefs:
235+
- group: inference.networking.k8s.io
236+
kind: InferencePool
237+
name: vllm-llama3-8b-instruct-ga
238+
weight: 100
239+
EOF
240+
```
241+
242+
**2. Final Verification**
243+
244+
Send test requests to ensure your `v1` stack is handling all traffic as expected.
245+
246+
<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" />
247+
248+
You should have a **`PROGRAMMED`** gateway:
249+
```bash
250+
❯ kubectl get gateway -o wide
251+
NAME CLASS ADDRESS PROGRAMMED AGE
252+
inference-gateway inference-gateway <IP_ADDRESS> True 10m
253+
```
254+
255+
Curl the endpoint and verify a **200** response code:
256+
```bash
257+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
258+
PORT=80
259+
260+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
261+
"model": "<your_model>",
262+
"prompt": "<your_prompt>",
263+
"max_tokens": 100,
264+
"temperature": 0
265+
}'
266+
```
267+
268+
**3. Clean Up v1alpha2 Resources**
269+
270+
After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources.

site-src/images/alpha-stage.png

293 KB
Loading

site-src/images/ga-stage.png

310 KB
Loading
428 KB
Loading

0 commit comments

Comments
 (0)