Skip to content

Commit 82e4a5b

Browse files
author
Noa Limoy
committed
feat(llm-katan): Add Kubernetes deployment support
- Add comprehensive Kustomize manifests (base + overlays for gpt35/claude) - Implement initContainer for efficient model caching using PVC - Fix config.py to read YLLM_SERVED_MODEL_NAME from environment variables - Add deployment documentation with examples for Kind cluster / Minikube This enables running multiple llm-katan instances in Kubernetes, each serving different model aliases while sharing the same underlying model. The overlays (gpt35, claude) demonstrate multi-instance deployments where each instance exposes a different served model name (e.g., gpt-3.5-turbo, claude-3-haiku-20240307) via the API. The served model name now works via environment variables, enabling Kubernetes deployments to expose diffrent model name via the API. Signed-off-by: Noa Limoy <[email protected]>
1 parent cfc7657 commit 82e4a5b

File tree

13 files changed

+1627
-3
lines changed

13 files changed

+1627
-3
lines changed

e2e-tests/llm-katan/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,25 @@ docker run -p 8000:8000 ghcr.io/vllm-project/semantic-router/llm-katan:latest \
3838
llm-katan --served-model-name "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
3939
```
4040

41+
#### Option 3: Kubernetes
42+
43+
```bash
44+
# Quick start with make targets
45+
make kube-deploy-llm-katan-gpt35 # Deploy GPT-3.5 simulation
46+
make kube-deploy-llm-katan-claude # Deploy Claude simulation
47+
make kube-deploy-llm-katan-multi # Deploy both models
48+
49+
# Or manually with kubectl
50+
kubectl apply -k e2e-tests/llm-katan/deploy/kubernetes/overlays/gpt35
51+
kubectl apply -k e2e-tests/llm-katan/deploy/kubernetes/overlays/claude
52+
53+
# Port forward and test
54+
make kube-port-forward-llm-katan LLM_KATAN_OVERLAY=gpt35
55+
curl http://localhost:8000/health
56+
```
57+
58+
**📚 For comprehensive Kubernetes deployment guide, see [docs/kubernetes.md](docs/kubernetes.md)**
59+
4160
### Setup
4261

4362
#### HuggingFace Token (Required)

0 commit comments

Comments
 (0)