Merge pull request #21 from terrytangyuan/patch-2

simon-mo · web-flow · commit 8f3b817f7b4b · 2025-01-29T09:47:50.000-08:00
Correct inference provider config for K8s deployment 2025-01-27-intro-to-llama-stack-with-vllm.md
diff --git a/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md b/_posts/2025-01-27-intro-to-llama-stack-with-vllm.md
@@ -131,7 +131,7 @@ models:
   provider_model_id: null
 ```
 
-Then we can start the LlamaStack Server with the image we built via `llama stack run`:
+Then we can start the Llama Stack Server with the image we built via `llama stack run`:
 ```
 export INFERENCE_ADDR=host.containers.internal
 export INFERENCE_PORT=8000
@@ -309,12 +309,12 @@ providers:
   - provider_id: vllm
     provider_type: remote::vllm
     config:
-      url: ${env.VLLM_URL}
-      max_tokens: ${env.VLLM_MAX_TOKENS:4096}
-      api_token: ${env.VLLM_API_TOKEN:fake}
+      url: http://vllm-server.default.svc.cluster.local:8000/v1
+      max_tokens: 4096
+      api_token: fake
 ```
 
-Once we have defined the run configuration for Llama Stack, we can build an image with that configuration the server source code:
+Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
 
 ```
 cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
@@ -329,7 +329,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
 ```
 
 
-We can then start the LlamaStack server by deploying a Kubernetes Pod and Service:
+We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
 ```
 cat <<EOF |kubectl apply -f -
 apiVersion: v1
@@ -380,7 +380,7 @@ spec:
 EOF
 ```
 
-We can check that the LlamaStack server has started:
+We can check that the Llama Stack server has started:
 ```
 $ kubectl logs vllm-server
 ...