Skip to content

Commit 8f3b817

Browse files
authored
Merge pull request #21 from terrytangyuan/patch-2
Correct inference provider config for K8s deployment 2025-01-27-intro-to-llama-stack-with-vllm.md
2 parents c8a5fd3 + 288580d commit 8f3b817

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

_posts/2025-01-27-intro-to-llama-stack-with-vllm.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ models:
131131
provider_model_id: null
132132
```
133133

134-
Then we can start the LlamaStack Server with the image we built via `llama stack run`:
134+
Then we can start the Llama Stack Server with the image we built via `llama stack run`:
135135
```
136136
export INFERENCE_ADDR=host.containers.internal
137137
export INFERENCE_PORT=8000
@@ -309,12 +309,12 @@ providers:
309309
- provider_id: vllm
310310
provider_type: remote::vllm
311311
config:
312-
url: ${env.VLLM_URL}
313-
max_tokens: ${env.VLLM_MAX_TOKENS:4096}
314-
api_token: ${env.VLLM_API_TOKEN:fake}
312+
url: http://vllm-server.default.svc.cluster.local:8000/v1
313+
max_tokens: 4096
314+
api_token: fake
315315
```
316316

317-
Once we have defined the run configuration for Llama Stack, we can build an image with that configuration the server source code:
317+
Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
318318

319319
```
320320
cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
@@ -329,7 +329,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
329329
```
330330

331331

332-
We can then start the LlamaStack server by deploying a Kubernetes Pod and Service:
332+
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
333333
```
334334
cat <<EOF |kubectl apply -f -
335335
apiVersion: v1
@@ -380,7 +380,7 @@ spec:
380380
EOF
381381
```
382382

383-
We can check that the LlamaStack server has started:
383+
We can check that the Llama Stack server has started:
384384
```
385385
$ kubectl logs vllm-server
386386
...

0 commit comments

Comments
 (0)