Commit 82e4a5b
Noa Limoy
feat(llm-katan): Add Kubernetes deployment support
- Add comprehensive Kustomize manifests (base + overlays for gpt35/claude)
- Implement initContainer for efficient model caching using PVC
- Fix config.py to read YLLM_SERVED_MODEL_NAME from environment variables
- Add deployment documentation with examples for Kind cluster / Minikube
This enables running multiple llm-katan instances in Kubernetes, each
serving different model aliases while sharing the same underlying model.
The overlays (gpt35, claude) demonstrate multi-instance deployments where
each instance exposes a different served model name (e.g., gpt-3.5-turbo,
claude-3-haiku-20240307) via the API.
The served model name now works via environment variables, enabling
Kubernetes deployments to expose diffrent model name via the API.
Signed-off-by: Noa Limoy <[email protected]>1 parent cfc7657 commit 82e4a5b
File tree
13 files changed
+1627
-3
lines changed- e2e-tests/llm-katan
- deploy
- docs
- kubernetes
- base
- components/common
- overlays
- claude
- gpt35
- llm_katan
- tools/make
13 files changed
+1627
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
41 | 60 | | |
42 | 61 | | |
43 | 62 | | |
| |||
0 commit comments