Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions site-src/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ A cluster with:
helm install vllm-llama3-8b-instruct \
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
--set provider.name=$GATEWAY_PROVIDER \
--version v0.3.0 \
--version v0.5.1 \
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```

Expand All @@ -297,12 +297,17 @@ A cluster with:

Wait until the gateway is ready.

Depending on the type of model server you have deployed, you must update the model field in the request body accordingly:
- vLLM Simulator Model Server: `food-review-1`
- CPU-Based Model Server: `food-review-0` or `food-review-1`
- GPU-Based Model Server: TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the mode value for the GPU one. Please help me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding an answer so you'll learn more about how it's being set -
you can find it here:


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed. food-review-1 should work for all.

```bash
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "food-review",
"model": "food-review-1",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
Expand Down