Skip to content

Commit 45c5bf3

Browse files
committed
Deploy to separate namespace
1 parent 75b7b4f commit 45c5bf3

File tree

1 file changed

+30
-17
lines changed

1 file changed

+30
-17
lines changed

AI/vllm-deployment/README.md

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -36,31 +36,40 @@ This example demonstrates how to deploy a server for AI inference using [vLLM](h
3636

3737
## Detailed Steps & Explanation
3838

39-
1. Ensure Hugging Face permissions to retrieve model:
39+
1. Create a namespace. This example uses `vllm-example`, but you can choose any name:
40+
41+
```bash
42+
kubectl create namespace vllm-example
43+
```
44+
45+
2. Ensure Hugging Face permissions to retrieve model:
4046

4147
```bash
4248
# Env var HF_TOKEN contains hugging face account token
43-
kubectl create secret generic hf-secret \
49+
# Make sure to use the same namespace as in the previous step
50+
kubectl create secret generic hf-secret -n vllm-example \
4451
--from-literal=hf_token=$HF_TOKEN
4552
```
4653

47-
2. Apply vLLM server:
54+
55+
3. Apply vLLM server:
4856

4957
```bash
50-
kubectl apply -f vllm-deployment.yaml
58+
# Make sure to use the same namespace as in the previous steps
59+
kubectl apply -f vllm-deployment.yaml -n vllm-example
5160
```
5261

5362
- Wait for deployment to reconcile, creating vLLM pod(s):
5463

5564
```bash
56-
kubectl wait --for=condition=Available --timeout=900s deployment/vllm-gemma-deployment
57-
kubectl get pods -l app=gemma-server -w
65+
kubectl wait --for=condition=Available --timeout=900s deployment/vllm-gemma-deployment -n vllm-example
66+
kubectl get pods -l app=gemma-server -w -n vllm-example
5867
```
5968

6069
- View vLLM pod logs:
6170

6271
```bash
63-
kubectl logs -f -l app=gemma-server
72+
kubectl logs -f -l app=gemma-server -n vllm-example
6473
```
6574

6675
Expected output:
@@ -77,11 +86,12 @@ Expected output:
7786
...
7887
```
7988

80-
3. Create service:
89+
4. Create service:
8190

8291
```bash
8392
# ClusterIP service on port 8080 in front of vllm deployment
84-
kubectl apply -f vllm-service.yaml
93+
# Make sure to use the same namespace as in the previous steps
94+
kubectl apply -f vllm-service.yaml -n vllm-example
8595
```
8696

8797
## Verification / Seeing it Work
@@ -90,18 +100,19 @@ kubectl apply -f vllm-service.yaml
90100

91101
```bash
92102
# Forward a local port (e.g., 8080) to the service port (e.g., 8080)
93-
kubectl port-forward service/vllm-service 8080:8080
103+
# Make sure to use the same namespace as in the previous steps
104+
kubectl port-forward service/vllm-service 8080:8080 -n vllm-example
94105
```
95106

96107
2. Send request to local forwarding port:
97108

98109
```bash
99110
curl -X POST http://localhost:8080/v1/chat/completions \
100111
-H "Content-Type: application/json" \
101-
-d '{
102-
"model": "google/gemma-3-1b-it",
103-
"messages": [{"role": "user", "content": "Explain Quantum Computing in simple terms."}],
104-
"max_tokens": 100
112+
-d '{ \
113+
"model": "google/gemma-3-1b-it", \
114+
"messages": [{"role": "user", "content": "Explain Quantum Computing in simple terms." }], \
115+
"max_tokens": 100 \
105116
}'
106117
```
107118

@@ -151,9 +162,11 @@ Node selectors make sure vLLM pods land on Nodes with the correct GPU, and they
151162
## Cleanup
152163
153164
```bash
154-
kubectl delete -f vllm-service.yaml
155-
kubectl delete -f vllm-deployment.yaml
156-
kubectl delete -f secret/hf_secret
165+
# Make sure to use the same namespace as in the previous steps
166+
kubectl delete -f vllm-service.yaml -n vllm-example
167+
kubectl delete -f vllm-deployment.yaml -n vllm-example
168+
kubectl delete secret hf-secret -n vllm-example
169+
kubectl delete namespace vllm-example
157170
```
158171

159172
---

0 commit comments

Comments
 (0)