@@ -7,20 +7,23 @@ This directory contains Kubernetes manifests for deploying the Semantic Router u
7
7
The deployment consists of:
8
8
9
9
- ** ConfigMap** : Contains ` config.yaml ` and ` tools_db.json ` configuration files
10
- - ** PersistentVolumeClaim** : 10Gi storage for model files
11
- - ** Deployment** :
10
+ - ** PersistentVolumeClaim** : 10Gi storage for model files
11
+ - ** Deployment** :
12
12
- ** Init Container** : Downloads/copies model files to persistent volume
13
13
- ** Main Container** : Runs the semantic router service
14
- - ** Services** :
15
- - Main service exposing gRPC port (50051) and metrics port (9190)
14
+ - ** Services** :
15
+ - Main service exposing gRPC port (50051), Classification API (8080), and metrics port (9190)
16
16
- Separate metrics service for monitoring
17
17
18
18
## Ports
19
19
20
20
- ** 50051** : gRPC API (vLLM Semantic Router ExtProc)
21
+ - ** 8080** : Classification API (HTTP REST API)
21
22
- ** 9190** : Prometheus metrics
22
23
23
- ## Deployment
24
+ ## Quick Start
25
+
26
+ ### Standard Kubernetes Deployment
24
27
25
28
``` bash
26
29
kubectl apply -k deploy/kubernetes/
@@ -32,3 +35,288 @@ kubectl get services -l app=semantic-router -n semantic-router
32
35
# View logs
33
36
kubectl logs -l app=semantic-router -n semantic-router -f
34
37
```
38
+
39
+ ### Kind (Kubernetes in Docker) Deployment
40
+
41
+ For local development and testing, you can deploy to a kind cluster with optimized resource settings.
42
+
43
+ #### Prerequisites
44
+
45
+ - [ Docker] ( https://docs.docker.com/get-docker/ ) installed and running
46
+ - [ kind] ( https://kind.sigs.k8s.io/docs/user/quick-start/#installation ) installed
47
+ - [ kubectl] ( https://kubernetes.io/docs/tasks/tools/ ) installed
48
+
49
+ #### Automated Deployment
50
+
51
+ Use the provided make targets for a complete automated setup:
52
+
53
+ ``` bash
54
+ # Complete setup: create cluster and deploy
55
+ make setup
56
+
57
+ # Or step by step:
58
+ make create-cluster
59
+ make deploy
60
+ ```
61
+
62
+ The setup process will:
63
+
64
+ 1 . Create a kind cluster with optimized configuration
65
+ 2 . Deploy the semantic router with appropriate resource limits
66
+ 3 . Wait for the deployment to be ready
67
+ 4 . Show deployment status and access instructions
68
+
69
+ #### Manual Kind Deployment
70
+
71
+ If you prefer manual deployment:
72
+
73
+ ** Step 1: Create kind cluster with custom configuration**
74
+
75
+ ``` bash
76
+ # Create cluster with optimized resource settings
77
+ kind create cluster --name semantic-router-cluster --config tools/kind/kind-config.yaml
78
+
79
+ # Verify cluster is ready
80
+ kubectl wait --for=condition=Ready nodes --all --timeout=300s
81
+ ```
82
+
83
+ ** Step 2: Deploy the application**
84
+
85
+ ``` bash
86
+ kubectl apply -k deploy/kubernetes/
87
+
88
+ # Wait for deployment to be ready
89
+ kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
90
+ ```
91
+
92
+ ** Step 3: Check deployment status**
93
+
94
+ ``` bash
95
+ # Check pods
96
+ kubectl get pods -n semantic-router -o wide
97
+
98
+ # Check services
99
+ kubectl get services -n semantic-router
100
+
101
+ # View logs
102
+ kubectl logs -l app=semantic-router -n semantic-router -f
103
+ ```
104
+
105
+ #### Resource Requirements for Kind
106
+
107
+ The deployment is optimized for kind clusters with the following resource allocation:
108
+
109
+ - ** Init Container** : 512Mi memory, 250m CPU (limits: 1Gi memory, 500m CPU)
110
+ - ** Main Container** : 3Gi memory, 1 CPU (limits: 6Gi memory, 2 CPU)
111
+ - ** Total Cluster** : Recommended minimum 8GB RAM, 4 CPU cores
112
+
113
+ #### Kind Cluster Configuration
114
+
115
+ The ` tools/kind/kind-config.yaml ` provides:
116
+
117
+ - Control plane node with system resource reservations
118
+ - Worker node for application workloads
119
+ - Optimized kubelet settings for resource management
120
+
121
+ #### Accessing Services in Kind
122
+
123
+ Using make commands (recommended):
124
+
125
+ ``` bash
126
+ # Access Classification API (HTTP REST)
127
+ make port-forward-api
128
+
129
+ # Access gRPC API
130
+ make port-forward-grpc
131
+
132
+ # Access metrics
133
+ make port-forward-metrics
134
+ ```
135
+
136
+ Or using kubectl directly:
137
+
138
+ ``` bash
139
+ # Access Classification API (HTTP REST)
140
+ kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
141
+
142
+ # Access gRPC API
143
+ kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
144
+
145
+ # Access metrics
146
+ kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
147
+ ```
148
+
149
+ #### Testing the Deployment
150
+
151
+ Use the provided make targets:
152
+
153
+ ``` bash
154
+ # Test overall deployment
155
+ make test-deployment
156
+
157
+ # Test Classification API specifically
158
+ make test-api
159
+
160
+ # Check deployment status
161
+ make status
162
+
163
+ # View logs
164
+ make logs
165
+ ```
166
+
167
+ The make targets provide comprehensive testing including:
168
+
169
+ - Pod readiness checks
170
+ - Service availability verification
171
+ - PVC status validation
172
+ - API health checks
173
+ - Basic functionality testing
174
+
175
+ #### Cleanup
176
+
177
+ Using make commands (recommended):
178
+
179
+ ``` bash
180
+ # Complete cleanup: undeploy and delete cluster
181
+ make cleanup
182
+
183
+ # Or step by step:
184
+ make undeploy
185
+ make delete-cluster
186
+ ```
187
+
188
+ Or using kubectl/kind directly:
189
+
190
+ ``` bash
191
+ # Remove deployment
192
+ kubectl delete -k deploy/kubernetes/
193
+
194
+ # Delete the kind cluster
195
+ kind delete cluster --name semantic-router-cluster
196
+ ```
197
+
198
+ ## Make Commands Reference
199
+
200
+ The project provides comprehensive make targets for managing kind clusters and deployments:
201
+
202
+ ### Cluster Management
203
+
204
+ ``` bash
205
+ make create-cluster # Create kind cluster with optimized configuration
206
+ make delete-cluster # Delete kind cluster
207
+ make cluster-info # Show cluster information and resource usage
208
+ ```
209
+
210
+ ### Deployment Management
211
+
212
+ ``` bash
213
+ make deploy # Deploy semantic-router to the cluster
214
+ make undeploy # Remove semantic-router from the cluster
215
+ make load-image # Load Docker image into kind cluster
216
+ make status # Show deployment status
217
+ ```
218
+
219
+ ### Testing and Monitoring
220
+
221
+ ``` bash
222
+ make test-deployment # Test the deployment
223
+ make test-api # Test the Classification API
224
+ make logs # Show application logs
225
+ ```
226
+
227
+ ### Port Forwarding
228
+
229
+ ``` bash
230
+ make port-forward-api # Port forward Classification API (8080)
231
+ make port-forward-grpc # Port forward gRPC API (50051)
232
+ make port-forward-metrics # Port forward metrics (9190)
233
+ ```
234
+
235
+ ### Combined Operations
236
+
237
+ ``` bash
238
+ make setup # Complete setup (create-cluster + deploy)
239
+ make cleanup # Complete cleanup (undeploy + delete-cluster)
240
+ ```
241
+
242
+ ### Configuration Variables
243
+
244
+ You can customize the deployment using environment variables:
245
+
246
+ ``` bash
247
+ # Custom cluster name
248
+ KIND_CLUSTER_NAME=my-cluster make create-cluster
249
+
250
+ # Custom kind config file
251
+ KIND_CONFIG_FILE=my-config.yaml make create-cluster
252
+
253
+ # Custom namespace
254
+ KUBE_NAMESPACE=my-namespace make deploy
255
+
256
+ # Custom Docker image
257
+ DOCKER_IMAGE=my-registry/semantic-router:latest make load-image
258
+ ```
259
+
260
+ ### Help
261
+
262
+ ``` bash
263
+ make help-kube # Show all available Kubernetes targets
264
+ ```
265
+
266
+ ## Troubleshooting
267
+
268
+ ### Common Issues
269
+
270
+ ** Pod stuck in Pending state:**
271
+
272
+ ``` bash
273
+ # Check node resources
274
+ kubectl describe nodes
275
+
276
+ # Check pod events
277
+ kubectl describe pod -n semantic-router -l app=semantic-router
278
+ ```
279
+
280
+ ** Init container fails:**
281
+
282
+ ``` bash
283
+ # Check init container logs
284
+ kubectl logs -n semantic-router -l app=semantic-router -c model-downloader
285
+ ```
286
+
287
+ ** Out of memory errors:**
288
+
289
+ ``` bash
290
+ # Check resource usage
291
+ kubectl top pods -n semantic-router
292
+
293
+ # Adjust resource limits in deployment.yaml if needed
294
+ ```
295
+
296
+ ### Resource Optimization
297
+
298
+ For different environments, you can adjust resource requirements:
299
+
300
+ - ** Development** : 2Gi memory, 0.5 CPU
301
+ - ** Testing** : 4Gi memory, 1 CPU
302
+ - ** Production** : 8Gi+ memory, 2+ CPU
303
+
304
+ Edit the ` resources ` section in ` deployment.yaml ` accordingly.
305
+
306
+ ## Files Overview
307
+
308
+ ### Kubernetes Manifests (` deploy/kubernetes/ ` )
309
+
310
+ - ` deployment.yaml ` - Main application deployment with optimized resource settings
311
+ - ` service.yaml ` - Services for gRPC, HTTP API, and metrics
312
+ - ` pvc.yaml ` - Persistent volume claim for model storage
313
+ - ` namespace.yaml ` - Dedicated namespace for the application
314
+ - ` config.yaml ` - Application configuration
315
+ - ` tools_db.json ` - Tools database for semantic routing
316
+ - ` kustomization.yaml ` - Kustomize configuration for easy deployment
317
+
318
+ ### Development Tools
319
+
320
+ - ` tools/kind/kind-config.yaml ` - Kind cluster configuration for local development
321
+ - ` tools/make/kube.mk ` - Make targets for Kubernetes operations
322
+ - ` Makefile ` - Root makefile including all make targets
0 commit comments