@@ -131,7 +131,7 @@ models:
131131 provider_model_id: null
132132```
133133
134- Then we can start the LlamaStack Server with the image we built via ` llama stack run ` :
134+ Then we can start the Llama Stack Server with the image we built via ` llama stack run ` :
135135```
136136export INFERENCE_ADDR=host.containers.internal
137137export INFERENCE_PORT=8000
@@ -309,12 +309,12 @@ providers:
309309 - provider_id: vllm
310310 provider_type: remote::vllm
311311 config:
312- url: ${env.VLLM_URL}
313- max_tokens: ${env.VLLM_MAX_TOKENS: 4096}
314- api_token: ${env.VLLM_API_TOKEN: fake}
312+ url: http://vllm-server.default.svc.cluster.local:8000/v1
313+ max_tokens: 4096
314+ api_token: fake
315315```
316316
317- Once we have defined the run configuration for Llama Stack, we can build an image with that configuration the server source code:
317+ Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
318318
319319```
320320cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
@@ -329,7 +329,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
329329```
330330
331331
332- We can then start the LlamaStack server by deploying a Kubernetes Pod and Service:
332+ We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
333333```
334334cat <<EOF |kubectl apply -f -
335335apiVersion: v1
@@ -380,7 +380,7 @@ spec:
380380EOF
381381```
382382
383- We can check that the LlamaStack server has started:
383+ We can check that the Llama Stack server has started:
384384```
385385$ kubectl logs vllm-server
386386...
0 commit comments