@@ -131,7 +131,7 @@ models:
131
131
provider_model_id: null
132
132
```
133
133
134
- Then we can start the LlamaStack Server with the image we built via ` llama stack run ` :
134
+ Then we can start the Llama Stack Server with the image we built via ` llama stack run ` :
135
135
```
136
136
export INFERENCE_ADDR=host.containers.internal
137
137
export INFERENCE_PORT=8000
@@ -309,12 +309,12 @@ providers:
309
309
- provider_id: vllm
310
310
provider_type: remote::vllm
311
311
config:
312
- url: ${env.VLLM_URL}
313
- max_tokens: ${env.VLLM_MAX_TOKENS: 4096}
314
- api_token: ${env.VLLM_API_TOKEN: fake}
312
+ url: http://vllm-server.default.svc.cluster.local:8000/v1
313
+ max_tokens: 4096
314
+ api_token: fake
315
315
```
316
316
317
- Once we have defined the run configuration for Llama Stack, we can build an image with that configuration the server source code:
317
+ Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
318
318
319
319
```
320
320
cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
@@ -329,7 +329,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
329
329
```
330
330
331
331
332
- We can then start the LlamaStack server by deploying a Kubernetes Pod and Service:
332
+ We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
333
333
```
334
334
cat <<EOF |kubectl apply -f -
335
335
apiVersion: v1
@@ -380,7 +380,7 @@ spec:
380
380
EOF
381
381
```
382
382
383
- We can check that the LlamaStack server has started:
383
+ We can check that the Llama Stack server has started:
384
384
```
385
385
$ kubectl logs vllm-server
386
386
...
0 commit comments