From 52c92e2d64b2df7d7c00efbbb3104faa0ab23c39 Mon Sep 17 00:00:00 2001 From: Murphy Chen Date: Thu, 4 Sep 2025 18:19:25 +0800 Subject: [PATCH 1/2] update guides docs --- site-src/guides/index.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 2c3abb49d..706604332 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -278,7 +278,7 @@ A cluster with: helm install vllm-llama3-8b-instruct \ --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \ --set provider.name=$GATEWAY_PROVIDER \ - --version v0.3.0 \ + --version v0.5.1 \ oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool ``` @@ -297,12 +297,17 @@ A cluster with: Wait until the gateway is ready. + Depending on the type of model server you have deployed, you must update the model field in the request body accordingly: + - vLLM Simulator Model Server: `food-review-1` + - CPU-Based Model Server: `food-review-0` or `food-review-1` + - GPU-Based Model Server: TODO + ```bash IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') PORT=80 curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ - "model": "food-review", + "model": "food-review-1", "prompt": "Write as if you were a critic: San Francisco", "max_tokens": 100, "temperature": 0 From 24be9d50462087f52aaa2df19ed19294c11804dd Mon Sep 17 00:00:00 2001 From: Murphy Chen Date: Thu, 4 Sep 2025 18:27:49 +0800 Subject: [PATCH 2/2] apply reviewer's suggestion --- site-src/guides/index.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 706604332..4e40ebaf5 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -297,11 +297,6 @@ A cluster with: Wait until the gateway is ready. - Depending on the type of model server you have deployed, you must update the model field in the request body accordingly: - - vLLM Simulator Model Server: `food-review-1` - - CPU-Based Model Server: `food-review-0` or `food-review-1` - - GPU-Based Model Server: TODO - ```bash IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') PORT=80