Skip to content

Commit ede160f

Browse files
authored
feat: support running vsr in kubernetes environment (#245)
Signed-off-by: bitliu <[email protected]>
1 parent ab0f8e7 commit ede160f

File tree

31 files changed

+1624
-403
lines changed

31 files changed

+1624
-403
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ _run:
1414
-f tools/make/milvus.mk \
1515
-f tools/make/models.mk \
1616
-f tools/make/pre-commit.mk \
17+
-f tools/make/kube.mk \
1718
$(MAKECMDGOALS)
1819

1920
.PHONY: _run

config/envoy-docker.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ static_resources:
3131
upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
3232
request_id: "%REQ(X-REQUEST-ID)%"
3333
selected_model: "%REQ(X-SELECTED-MODEL)%"
34-
selected_endpoint: "%REQ(X-SEMANTIC-DESTINATION-ENDPOINT)%"
34+
selected_endpoint: "%REQ(X-GATEWAY-DESTINATION-ENDPOINT)%"
3535
route_config:
3636
name: local_route
3737
virtual_hosts:
@@ -106,7 +106,7 @@ static_resources:
106106
lb_policy: CLUSTER_PROVIDED
107107
original_dst_lb_config:
108108
use_http_header: true
109-
http_header_name: "x-semantic-destination-endpoint"
109+
http_header_name: "x-gateway-destination-endpoint"
110110
typed_extension_protocol_options:
111111
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
112112
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions

config/envoy.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ static_resources:
3131
upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
3232
request_id: "%REQ(X-REQUEST-ID)%"
3333
selected_model: "%REQ(X-SELECTED-MODEL)%"
34-
selected_endpoint: "%REQ(X-SEMANTIC-DESTINATION-ENDPOINT)%"
34+
selected_endpoint: "%REQ(X-GATEWAY-DESTINATION-ENDPOINT)%"
3535
route_config:
3636
name: local_route
3737
virtual_hosts:
@@ -106,7 +106,7 @@ static_resources:
106106
lb_policy: CLUSTER_PROVIDED
107107
original_dst_lb_config:
108108
use_http_header: true
109-
http_header_name: "x-semantic-destination-endpoint"
109+
http_header_name: "x-gateway-destination-endpoint"
110110
typed_extension_protocol_options:
111111
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
112112
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions

deploy/kubernetes/README.md

Lines changed: 293 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,23 @@ This directory contains Kubernetes manifests for deploying the Semantic Router u
77
The deployment consists of:
88

99
- **ConfigMap**: Contains `config.yaml` and `tools_db.json` configuration files
10-
- **PersistentVolumeClaim**: 10Gi storage for model files
11-
- **Deployment**:
10+
- **PersistentVolumeClaim**: 10Gi storage for model files
11+
- **Deployment**:
1212
- **Init Container**: Downloads/copies model files to persistent volume
1313
- **Main Container**: Runs the semantic router service
14-
- **Services**:
15-
- Main service exposing gRPC port (50051) and metrics port (9190)
14+
- **Services**:
15+
- Main service exposing gRPC port (50051), Classification API (8080), and metrics port (9190)
1616
- Separate metrics service for monitoring
1717

1818
## Ports
1919

2020
- **50051**: gRPC API (vLLM Semantic Router ExtProc)
21+
- **8080**: Classification API (HTTP REST API)
2122
- **9190**: Prometheus metrics
2223

23-
## Deployment
24+
## Quick Start
25+
26+
### Standard Kubernetes Deployment
2427

2528
```bash
2629
kubectl apply -k deploy/kubernetes/
@@ -32,3 +35,288 @@ kubectl get services -l app=semantic-router -n semantic-router
3235
# View logs
3336
kubectl logs -l app=semantic-router -n semantic-router -f
3437
```
38+
39+
### Kind (Kubernetes in Docker) Deployment
40+
41+
For local development and testing, you can deploy to a kind cluster with optimized resource settings.
42+
43+
#### Prerequisites
44+
45+
- [Docker](https://docs.docker.com/get-docker/) installed and running
46+
- [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) installed
47+
- [kubectl](https://kubernetes.io/docs/tasks/tools/) installed
48+
49+
#### Automated Deployment
50+
51+
Use the provided make targets for a complete automated setup:
52+
53+
```bash
54+
# Complete setup: create cluster and deploy
55+
make setup
56+
57+
# Or step by step:
58+
make create-cluster
59+
make deploy
60+
```
61+
62+
The setup process will:
63+
64+
1. Create a kind cluster with optimized configuration
65+
2. Deploy the semantic router with appropriate resource limits
66+
3. Wait for the deployment to be ready
67+
4. Show deployment status and access instructions
68+
69+
#### Manual Kind Deployment
70+
71+
If you prefer manual deployment:
72+
73+
**Step 1: Create kind cluster with custom configuration**
74+
75+
```bash
76+
# Create cluster with optimized resource settings
77+
kind create cluster --name semantic-router-cluster --config tools/kind/kind-config.yaml
78+
79+
# Verify cluster is ready
80+
kubectl wait --for=condition=Ready nodes --all --timeout=300s
81+
```
82+
83+
**Step 2: Deploy the application**
84+
85+
```bash
86+
kubectl apply -k deploy/kubernetes/
87+
88+
# Wait for deployment to be ready
89+
kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
90+
```
91+
92+
**Step 3: Check deployment status**
93+
94+
```bash
95+
# Check pods
96+
kubectl get pods -n semantic-router -o wide
97+
98+
# Check services
99+
kubectl get services -n semantic-router
100+
101+
# View logs
102+
kubectl logs -l app=semantic-router -n semantic-router -f
103+
```
104+
105+
#### Resource Requirements for Kind
106+
107+
The deployment is optimized for kind clusters with the following resource allocation:
108+
109+
- **Init Container**: 512Mi memory, 250m CPU (limits: 1Gi memory, 500m CPU)
110+
- **Main Container**: 3Gi memory, 1 CPU (limits: 6Gi memory, 2 CPU)
111+
- **Total Cluster**: Recommended minimum 8GB RAM, 4 CPU cores
112+
113+
#### Kind Cluster Configuration
114+
115+
The `tools/kind/kind-config.yaml` provides:
116+
117+
- Control plane node with system resource reservations
118+
- Worker node for application workloads
119+
- Optimized kubelet settings for resource management
120+
121+
#### Accessing Services in Kind
122+
123+
Using make commands (recommended):
124+
125+
```bash
126+
# Access Classification API (HTTP REST)
127+
make port-forward-api
128+
129+
# Access gRPC API
130+
make port-forward-grpc
131+
132+
# Access metrics
133+
make port-forward-metrics
134+
```
135+
136+
Or using kubectl directly:
137+
138+
```bash
139+
# Access Classification API (HTTP REST)
140+
kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
141+
142+
# Access gRPC API
143+
kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
144+
145+
# Access metrics
146+
kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
147+
```
148+
149+
#### Testing the Deployment
150+
151+
Use the provided make targets:
152+
153+
```bash
154+
# Test overall deployment
155+
make test-deployment
156+
157+
# Test Classification API specifically
158+
make test-api
159+
160+
# Check deployment status
161+
make status
162+
163+
# View logs
164+
make logs
165+
```
166+
167+
The make targets provide comprehensive testing including:
168+
169+
- Pod readiness checks
170+
- Service availability verification
171+
- PVC status validation
172+
- API health checks
173+
- Basic functionality testing
174+
175+
#### Cleanup
176+
177+
Using make commands (recommended):
178+
179+
```bash
180+
# Complete cleanup: undeploy and delete cluster
181+
make cleanup
182+
183+
# Or step by step:
184+
make undeploy
185+
make delete-cluster
186+
```
187+
188+
Or using kubectl/kind directly:
189+
190+
```bash
191+
# Remove deployment
192+
kubectl delete -k deploy/kubernetes/
193+
194+
# Delete the kind cluster
195+
kind delete cluster --name semantic-router-cluster
196+
```
197+
198+
## Make Commands Reference
199+
200+
The project provides comprehensive make targets for managing kind clusters and deployments:
201+
202+
### Cluster Management
203+
204+
```bash
205+
make create-cluster # Create kind cluster with optimized configuration
206+
make delete-cluster # Delete kind cluster
207+
make cluster-info # Show cluster information and resource usage
208+
```
209+
210+
### Deployment Management
211+
212+
```bash
213+
make deploy # Deploy semantic-router to the cluster
214+
make undeploy # Remove semantic-router from the cluster
215+
make load-image # Load Docker image into kind cluster
216+
make status # Show deployment status
217+
```
218+
219+
### Testing and Monitoring
220+
221+
```bash
222+
make test-deployment # Test the deployment
223+
make test-api # Test the Classification API
224+
make logs # Show application logs
225+
```
226+
227+
### Port Forwarding
228+
229+
```bash
230+
make port-forward-api # Port forward Classification API (8080)
231+
make port-forward-grpc # Port forward gRPC API (50051)
232+
make port-forward-metrics # Port forward metrics (9190)
233+
```
234+
235+
### Combined Operations
236+
237+
```bash
238+
make setup # Complete setup (create-cluster + deploy)
239+
make cleanup # Complete cleanup (undeploy + delete-cluster)
240+
```
241+
242+
### Configuration Variables
243+
244+
You can customize the deployment using environment variables:
245+
246+
```bash
247+
# Custom cluster name
248+
KIND_CLUSTER_NAME=my-cluster make create-cluster
249+
250+
# Custom kind config file
251+
KIND_CONFIG_FILE=my-config.yaml make create-cluster
252+
253+
# Custom namespace
254+
KUBE_NAMESPACE=my-namespace make deploy
255+
256+
# Custom Docker image
257+
DOCKER_IMAGE=my-registry/semantic-router:latest make load-image
258+
```
259+
260+
### Help
261+
262+
```bash
263+
make help-kube # Show all available Kubernetes targets
264+
```
265+
266+
## Troubleshooting
267+
268+
### Common Issues
269+
270+
**Pod stuck in Pending state:**
271+
272+
```bash
273+
# Check node resources
274+
kubectl describe nodes
275+
276+
# Check pod events
277+
kubectl describe pod -n semantic-router -l app=semantic-router
278+
```
279+
280+
**Init container fails:**
281+
282+
```bash
283+
# Check init container logs
284+
kubectl logs -n semantic-router -l app=semantic-router -c model-downloader
285+
```
286+
287+
**Out of memory errors:**
288+
289+
```bash
290+
# Check resource usage
291+
kubectl top pods -n semantic-router
292+
293+
# Adjust resource limits in deployment.yaml if needed
294+
```
295+
296+
### Resource Optimization
297+
298+
For different environments, you can adjust resource requirements:
299+
300+
- **Development**: 2Gi memory, 0.5 CPU
301+
- **Testing**: 4Gi memory, 1 CPU
302+
- **Production**: 8Gi+ memory, 2+ CPU
303+
304+
Edit the `resources` section in `deployment.yaml` accordingly.
305+
306+
## Files Overview
307+
308+
### Kubernetes Manifests (`deploy/kubernetes/`)
309+
310+
- `deployment.yaml` - Main application deployment with optimized resource settings
311+
- `service.yaml` - Services for gRPC, HTTP API, and metrics
312+
- `pvc.yaml` - Persistent volume claim for model storage
313+
- `namespace.yaml` - Dedicated namespace for the application
314+
- `config.yaml` - Application configuration
315+
- `tools_db.json` - Tools database for semantic routing
316+
- `kustomization.yaml` - Kustomize configuration for easy deployment
317+
318+
### Development Tools
319+
320+
- `tools/kind/kind-config.yaml` - Kind cluster configuration for local development
321+
- `tools/make/kube.mk` - Make targets for Kubernetes operations
322+
- `Makefile` - Root makefile including all make targets

0 commit comments

Comments
 (0)