Skip to content

Commit ffe9f69

Browse files
yossiovadiaclaude
andcommitted
feat: add OpenShift deployment infrastructure with GPU support
This commit adds comprehensive OpenShift deployment support with GPU-enabled specialist model containers, providing a complete automation solution for deploying the semantic router to OpenShift clusters. ## New Files (deploy/openshift/): **Core Deployment:** - deployment.yaml: Kubernetes deployment manifest with GPU support * 4-container pod: semantic-router, model-a, model-b, envoy-proxy * CDI annotations for GPU device injection (gpu=0, gpu=1) * GPU node selection and tolerations * PVC mounts for models and cache * Production log levels (INFO for containers, info for Envoy) - deploy-to-openshift.sh: Main deployment automation script (826 lines) * Auto-detection of OpenShift server and existing login * Enhanced deployment method with llm-katan specialists * Alternative methods: kustomize, template * Configurable resources, storage, logging * Automatic namespace creation * Inline Dockerfile build for llm-katan image * Service and route creation * Optional port forwarding (disabled by default) * Displays OpenWebUI endpoint at completion - cleanup-openshift.sh: Cleanup automation script (494 lines) * Auto-detection of cluster and namespace * Graceful cleanup with confirmation * Port forwarding cleanup * Comprehensive resource deletion **Configuration:** - config-openshift.yaml: Semantic router config for OpenShift * Math-specialist and coding-specialist endpoints * Category-to-specialist routing * PII and jailbreak detection configuration - envoy-openshift.yaml: Envoy proxy configuration * HTTP listener on port 8801 * External processing filter * Specialist model routing * /v1/models aggregation **Container Image:** - Dockerfile.llm-katan: GPU-enabled specialist container image * Python 3.10-slim base * PyTorch with CUDA 12.1 support * llm-katan, transformers, accelerate packages * HuggingFace caching configuration * Health check endpoint **Alternative Deployment Methods:** - kustomization.yaml: Kustomize deployment option - template.yaml: OpenShift template with parameters **Documentation & Validation:** - README.md: Comprehensive deployment documentation - validate-deployment.sh: 12-test validation script * Namespace, deployment, container readiness * GPU detection in both specialist containers * Model loading verification * PVC, service, route checks * GPU node scheduling confirmation ## Integration Files: - Makefile: Add include for tools/make/openshift.mk - tools/make/openshift.mk: Optional make targets for OpenShift operations * openshift-deploy, openshift-cleanup, openshift-status * openshift-logs, openshift-routes, openshift-test * Port forwarding helpers ## Key Features: 1. **GPU Support**: Full NVIDIA GPU support via CDI device injection 2. **Specialist Models**: Real llm-katan containers for math/coding tasks 3. **Zero-Touch Deployment**: Auto-detection of cluster, automatic builds 4. **Production Ready**: Production log levels, proper health checks 5. **Validation**: Comprehensive 12-test validation suite 6. **UX Enhancements**: OpenWebUI endpoint display, optional port forwarding 7. **Clean Separation**: Only touches deploy/openshift/ (plus minimal Makefile) ## Architecture: ``` Pod: semantic-router ├── semantic-router (main ExtProc service, port 50051) ├── model-a (llm-katan math specialist, port 8000, GPU 0) ├── model-b (llm-katan coding specialist, port 8001, GPU 1) └── envoy-proxy (gateway, port 8801) ``` ## Testing: Validated on OpenShift with NVIDIA L4 GPUs: - All 4 containers running - GPUs detected in both specialist containers - Models loaded on CUDA - PVCs bound - Services and routes accessible - Streaming functionality working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
1 parent 959e649 commit ffe9f69

12 files changed

+3115
-0
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ _run:
1616
-f tools/make/pre-commit.mk \
1717
-f tools/make/docker.mk \
1818
-f tools/make/kube.mk \
19+
-f tools/make/openshift.mk \
1920
$(MAKECMDGOALS)
2021

2122
.PHONY: _run
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Optimized Dockerfile for llm-katan - OpenShift compatible
2+
FROM python:3.10-slim
3+
4+
# Install minimal system dependencies
5+
RUN apt-get update && \
6+
apt-get install -y --no-install-recommends \
7+
git \
8+
curl \
9+
&& rm -rf /var/lib/apt/lists/*
10+
11+
# Set working directory
12+
WORKDIR /app
13+
14+
# Install PyTorch with CUDA 12.1 support (compatible with CUDA 12.x drivers)
15+
RUN pip install --no-cache-dir \
16+
torch torchvision --index-url https://download.pytorch.org/whl/cu121
17+
18+
# Install llm-katan and its dependencies
19+
RUN pip install --no-cache-dir \
20+
llm-katan \
21+
transformers \
22+
accelerate \
23+
fastapi \
24+
uvicorn \
25+
click \
26+
pydantic \
27+
numpy
28+
29+
# Set environment variables for caching
30+
ENV HF_HUB_CACHE=/tmp/hf_cache
31+
ENV TRANSFORMERS_CACHE=/tmp/transformers_cache
32+
ENV HF_HOME=/tmp/hf_cache
33+
34+
# Create cache directories
35+
RUN mkdir -p /tmp/hf_cache /tmp/transformers_cache
36+
37+
# Expose ports
38+
EXPOSE 8000 8001
39+
40+
# Health check
41+
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
42+
CMD curl -f http://localhost:8000/health || exit 1
43+
44+
# Default command - this will be overridden by deployment args
45+
CMD ["llm-katan", "--help"]

deploy/openshift/README.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# OpenShift Deployment for Semantic Router
2+
3+
This directory contains OpenShift-specific deployment manifests for the vLLM Semantic Router.
4+
5+
## Quick Deployment
6+
7+
### Prerequisites
8+
9+
- OpenShift cluster access
10+
- `oc` CLI tool configured and logged in
11+
- Cluster admin privileges (or permissions to create namespaces and routes)
12+
13+
### One-Command Deployment
14+
15+
```bash
16+
oc apply -k deploy/openshift/
17+
```
18+
19+
### Step-by-Step Deployment
20+
21+
1. **Create namespace:**
22+
23+
```bash
24+
oc apply -f deploy/openshift/namespace.yaml
25+
```
26+
27+
2. **Deploy core resources:**
28+
29+
```bash
30+
oc apply -f deploy/openshift/pvc.yaml
31+
oc apply -f deploy/openshift/deployment.yaml
32+
oc apply -f deploy/openshift/service.yaml
33+
```
34+
35+
3. **Create external routes:**
36+
37+
```bash
38+
oc apply -f deploy/openshift/routes.yaml
39+
```
40+
41+
## Accessing Services
42+
43+
After deployment, the services will be accessible via OpenShift Routes:
44+
45+
### Get Route URLs
46+
47+
```bash
48+
# Classification API (HTTP REST)
49+
oc get route semantic-router-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}'
50+
51+
# gRPC API
52+
oc get route semantic-router-grpc -n vllm-semantic-router-system -o jsonpath='{.spec.host}'
53+
54+
# Metrics
55+
oc get route semantic-router-metrics -n vllm-semantic-router-system -o jsonpath='{.spec.host}'
56+
```
57+
58+
### Example Usage
59+
60+
```bash
61+
# Get the API route
62+
API_ROUTE=$(oc get route semantic-router-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
63+
64+
# Test health endpoint
65+
curl https://$API_ROUTE/health
66+
67+
# Test classification
68+
curl -X POST https://$API_ROUTE/api/v1/classify/intent \
69+
-H "Content-Type: application/json" \
70+
-d '{"text": "What is machine learning?"}'
71+
```
72+
73+
## Architecture Differences from Kubernetes
74+
75+
### Security Context
76+
77+
- Removed `runAsNonRoot: false` for OpenShift compatibility
78+
- Enhanced security context with `capabilities.drop: ALL` and `seccompProfile`
79+
- OpenShift automatically enforces non-root containers
80+
81+
### Networking
82+
83+
- Uses OpenShift Routes instead of port-forwarding for external access
84+
- TLS termination handled by OpenShift router
85+
- Automatic HTTPS certificates via OpenShift
86+
87+
### Storage
88+
89+
- Uses OpenShift's default storage class
90+
- PVC automatically bound to available storage
91+
92+
## Monitoring
93+
94+
### Check Deployment Status
95+
96+
```bash
97+
# Check pods
98+
oc get pods -n vllm-semantic-router-system
99+
100+
# Check services
101+
oc get services -n vllm-semantic-router-system
102+
103+
# Check routes
104+
oc get routes -n vllm-semantic-router-system
105+
106+
# Check logs
107+
oc logs -f deployment/semantic-router -n vllm-semantic-router-system
108+
```
109+
110+
### Metrics
111+
112+
Access Prometheus metrics via the metrics route:
113+
114+
```bash
115+
METRICS_ROUTE=$(oc get route semantic-router-metrics -n vllm-semantic-router-system -o jsonpath='{.spec.host}')
116+
curl https://$METRICS_ROUTE/metrics
117+
```
118+
119+
## Cleanup
120+
121+
Remove all resources:
122+
123+
```bash
124+
oc delete -k deploy/openshift/
125+
```
126+
127+
Or remove individual components:
128+
129+
```bash
130+
oc delete -f deploy/openshift/routes.yaml
131+
oc delete -f deploy/openshift/service.yaml
132+
oc delete -f deploy/openshift/deployment.yaml
133+
oc delete -f deploy/openshift/pvc.yaml
134+
oc delete -f deploy/openshift/namespace.yaml
135+
```
136+
137+
## Troubleshooting
138+
139+
### Common Issues
140+
141+
**1. Pod fails to start due to security context:**
142+
143+
```bash
144+
oc describe pod -l app=semantic-router -n vllm-semantic-router-system
145+
```
146+
147+
**2. Storage issues:**
148+
149+
```bash
150+
oc get pvc -n vllm-semantic-router-system
151+
oc describe pvc semantic-router-models -n vllm-semantic-router-system
152+
```
153+
154+
**3. Route not accessible:**
155+
156+
```bash
157+
oc get routes -n vllm-semantic-router-system
158+
oc describe route semantic-router-api -n vllm-semantic-router-system
159+
```
160+
161+
### Resource Requirements
162+
163+
The deployment requires:
164+
165+
- **Memory**: 3Gi request, 6Gi limit
166+
- **CPU**: 1 core request, 2 cores limit
167+
- **Storage**: 10Gi for model storage
168+
169+
Adjust resource limits in `deployment.yaml` if needed for your cluster capacity.
170+
171+
## Files Overview
172+
173+
- `namespace.yaml` - Namespace with OpenShift-specific annotations
174+
- `pvc.yaml` - Persistent volume claim for model storage
175+
- `deployment.yaml` - Main application deployment with OpenShift security contexts
176+
- `service.yaml` - Services for gRPC, HTTP API, and metrics
177+
- `routes.yaml` - OpenShift routes for external access
178+
- `config.yaml` - Application configuration
179+
- `tools_db.json` - Tools database for semantic routing
180+
- `kustomization.yaml` - Kustomize configuration for easy deployment

0 commit comments

Comments
 (0)