Skip to content

Commit c8f485f

Browse files
committed
v0.1.0 with fix #879
Signed-off-by: Yuan Tang <[email protected]>
1 parent 0121eb4 commit c8f485f

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

_posts/2025-01-27-intro-to-llama-stack-with-vllm.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,9 @@ In this article, we will demonstrate the functionality through the remote vLLM i
3232

3333
* Linux operating system
3434
* [Hugging Face CLI](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) if you'd like to download the model via CLI.
35-
* OCI-compliant technologies like [Podman](https://podman.io/) or [Docker](https://www.docker.com/) (can be specified via the `CONTAINER_BINARY` environment variable when running `llama stack` CLI commands).
35+
* OCI-compliant container technologies like [Podman](https://podman.io/) or [Docker](https://www.docker.com/) (can be specified via the `CONTAINER_BINARY` environment variable when running `llama stack` CLI commands).
3636
* [Kind](https://kind.sigs.k8s.io/) for Kubernetes deployment.
3737
* [Conda](https://github.com/conda/conda) for managing Python environment.
38-
* Python >= 3.10 if you'd like to test the [Llama Stack Python SDK](https://github.com/meta-llama/llama-stack-client-python).
3938

4039

4140
## Get Started via Containers
@@ -119,7 +118,7 @@ image_type: container
119118
EOF
120119
121120
export CONTAINER_BINARY=podman
122-
LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack build --config /tmp/test-vllm-llama-stack/vllm-llama-stack-build.yaml
121+
LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack build --config /tmp/test-vllm-llama-stack/vllm-llama-stack-build.yaml --image-name distribution-myenv
123122
```
124123

125124
Once the container image has been built successfully, we can then edit the generated `vllm-run.yaml` to be `/tmp/test-vllm-llama-stack/vllm-llama-stack-run.yaml` with the following change in the `models` field:
@@ -137,14 +136,14 @@ Then we can start the LlamaStack Server with the image we built via `llama stack
137136
export INFERENCE_ADDR=host.containers.internal
138137
export INFERENCE_PORT=8000
139138
export INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
140-
export LLAMASTACK_PORT=5000
139+
export LLAMA_STACK_PORT=5000
141140
142141
LLAMA_STACK_DIR=. PYTHONPATH=. python -m llama_stack.cli.llama stack run \
143142
--env INFERENCE_MODEL=$INFERENCE_MODEL \
144143
--env VLLM_URL=http://$INFERENCE_ADDR:$INFERENCE_PORT/v1 \
145144
--env VLLM_MAX_TOKENS=8192 \
146145
--env VLLM_API_TOKEN=fake \
147-
--env LLAMASTACK_PORT=$LLAMASTACK_PORT \
146+
--env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \
148147
/tmp/test-vllm-llama-stack/vllm-llama-stack-run.yaml
149148
```
150149

@@ -155,7 +154,7 @@ podman run --security-opt label=disable -it --network host -v /tmp/test-vllm-lla
155154
--env VLLM_URL=http://$INFERENCE_ADDR:$INFERENCE_PORT/v1 \
156155
--env VLLM_MAX_TOKENS=8192 \
157156
--env VLLM_API_TOKEN=fake \
158-
--env LLAMASTACK_PORT=$LLAMASTACK_PORT \
157+
--env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \
159158
--entrypoint='["python", "-m", "llama_stack.distribution.server.server", "--yaml-config", "/app/config.yaml"]' \
160159
localhost/distribution-myenv:dev
161160
```

0 commit comments

Comments
 (0)