Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ nav:
- Configuration Guide:
- Configuring the plugins via configuration files or text: guides/epp-configuration/config-text.md
- Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
- Migration Guide: guides/ga-migration.md
- Troubleshooting Guide: guides/troubleshooting.md
- Implementer Guides:
- Getting started: guides/implementers.md
Expand Down
270 changes: 270 additions & 0 deletions site-src/guides/ga-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Inference Gateway: Migrating from v1alpha2 to v1 API

## Introduction

This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API.
This document is intended for platform administrators and networking specialists
who are currently using the `v1alpha2` version of the Inference Gateway and
want to upgrade to the `v1` version to leverage the latest features and improvements.

Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway.

***

## Before you begin

Before starting the migration, it's important to determine if this guide is necessary for your setup.

### Checking for Existing v1alpha2 APIs

To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command:

```bash
kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces
```

* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide.
* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway.

***

## Migration Paths

There are two paths for migrating from `v1alpha2` to `v1`:

1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions.
2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic.

***

## Simple Migration (with downtime)

This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration.

### 1. Delete Existing v1alpha2 Resources

**Option a: Uninstall using Helm.**

```bash
helm uninstall <helm_alpha_inferencepool_name>
```

**Option b: Manually delete alpha `InferencePool` resources.**

If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves.

1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`.
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.
3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster.
```bash
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml
```

### 2. Install v1 Resources

After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway. This involves installing the new `v1` CRDs, creating a new `v1` `InferencePool` and corresponding `InferenceObjective` resources, and creating a new `HTTPRoute` that directs traffic to your new `v1` `InferencePool`.


### 3. Verify the Deployment

After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway.

```bash
❯ kubectl get gateway -o wide
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway inference-gateway <IP_ADDRESS> True 10m
```

Curl the endpoint to make sure you are getting a successful response with a **200** response code.

```bash
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "<your_model>",
"prompt": "<your_prompt>",
"max_tokens": 100,
"temperature": 0
}'
```

***

## Zero-Downtime Migration

This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram

<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" />

### A Note on Interacting with Multiple API Versions

During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name:

* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io`
* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io`

The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically:

```bash
kubectl get infpool
```

This guide will use these full names or the short name for `v1` to avoid ambiguity.

***

### Stage 1: Side-by-side v1 Deployment

In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration.

After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram

<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" />

**1. Install v1 CRDs**

```bash
RELEASE=v1.0.0
kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml)
```

**2. Install the v1 `InferencePool`**

Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`).

```bash
helm install vllm-llama3-8b-instruct-ga \
--set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \
--set provider.name=<YOUR_PROVIDER> \
--version $RELEASE \
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```

**3. Create the v1 `InferenceObjective`**

The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`.

```yaml
kubectl apply -f - <<EOF
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceObjective
metadata:
name: food-review
spec:
priority: 1
poolRef:
group: inference.networking.k8s.io
name: vllm-llama3-8b-instruct-ga
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceObjective
metadata:
name: base-model
spec:
priority: 2
poolRef:
group: inference.networking.k8s.io
name: vllm-llama3-8b-instruct-ga
---
EOF
```

***

### Stage 2: Traffic Shifting

With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split.

**1. Update `HTTPRoute` for Traffic Splitting**

```yaml
kubectl apply -f - <<EOF
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.x-k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-alpha
weight: 50
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-ga
weight: 50
---
EOF
```

**2. Verify and Monitor**

After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`.

***

### Stage 3: Finalization and Cleanup

Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources.

**1. Shift 100% of Traffic to the v1 `InferencePool`**

Update the `HTTPRoute` to send all traffic to the `v1` pool.

```yaml
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-ga
weight: 100
EOF
```

**2. Final Verification**

Send test requests to ensure your `v1` stack is handling all traffic as expected.

<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" />

You should have a **`PROGRAMMED`** gateway:
```bash
❯ kubectl get gateway -o wide
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway inference-gateway <IP_ADDRESS> True 10m
```

Curl the endpoint and verify a **200** response code:
```bash
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "<your_model>",
"prompt": "<your_prompt>",
"max_tokens": 100,
"temperature": 0
}'
```

**3. Clean Up v1alpha2 Resources**

After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources.
Binary file added site-src/images/alpha-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site-src/images/ga-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site-src/images/migration-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.