OpenShift Deployment with GPU Support #376

yossiovadia · 2025-10-08T20:43:38Z

Overview

Adds complete OpenShift deployment infrastructure with NVIDIA GPU support for running semantic router with specialized LLM containers. Provides zero-touch deployment automation with comprehensive validation.

Architecture

Pod: semantic-router (4 containers)
├── semantic-router - Main ExtProc service (port 50051, 8080)
├── model-a - Math specialist (port 8000, GPU 0)
├── model-b - Coding specialist (port 8001, GPU 1)
└── envoy-proxy - HTTP gateway (port 8801)

Quick Start

   # Login to OpenShift
  oc login https://your-cluster:6443
  
  # Deploy (auto-detects OpenShift server if logged in)
  ./deploy/openshift/deploy-to-openshift.sh

  # Validate deployment
  ./deploy/openshift/validate-deployment.sh

  # Cleanup
  ./deploy/openshift/cleanup-openshift.sh

OpenWebUI Integration

The deployment provides an OpenWebUI-compatible endpoint:

http://envoy-http-./v1

Configure this URL in OpenWebUI settings to use the semantic router as your LLM backend with automatic category-based routing and security guardrails.

netlify · 2025-10-08T20:43:44Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`f938960`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e6cda91573c00008746460
😎 Deploy Preview	https://deploy-preview-376--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

This commit adds comprehensive OpenShift deployment support with GPU-enabled specialist model containers, providing a complete automation solution for deploying the semantic router to OpenShift clusters. **Core Deployment:** - deployment.yaml: Kubernetes deployment manifest with GPU support * 4-container pod: semantic-router, model-a, model-b, envoy-proxy * CDI annotations for GPU device injection (gpu=0, gpu=1) * GPU node selection and tolerations * PVC mounts for models and cache * Production log levels (INFO for containers, info for Envoy) - deploy-to-openshift.sh: Main deployment automation script (826 lines) * Auto-detection of OpenShift server and existing login * Enhanced deployment method with llm-katan specialists * Alternative methods: kustomize, template * Configurable resources, storage, logging * Automatic namespace creation * Inline Dockerfile build for llm-katan image * Service and route creation * Optional port forwarding (disabled by default) * Displays OpenWebUI endpoint at completion - cleanup-openshift.sh: Cleanup automation script (494 lines) * Auto-detection of cluster and namespace * Graceful cleanup with confirmation * Port forwarding cleanup * Comprehensive resource deletion **Configuration:** - config-openshift.yaml: Semantic router config for OpenShift * Math-specialist and coding-specialist endpoints * Category-to-specialist routing * PII and jailbreak detection configuration - envoy-openshift.yaml: Envoy proxy configuration * HTTP listener on port 8801 * External processing filter * Specialist model routing * /v1/models aggregation **Container Image:** - Dockerfile.llm-katan: GPU-enabled specialist container image * Python 3.10-slim base * PyTorch with CUDA 12.1 support * llm-katan, transformers, accelerate packages * HuggingFace caching configuration * Health check endpoint **Alternative Deployment Methods:** - kustomization.yaml: Kustomize deployment option - template.yaml: OpenShift template with parameters **Documentation & Validation:** - README.md: Comprehensive deployment documentation - validate-deployment.sh: 12-test validation script * Namespace, deployment, container readiness * GPU detection in both specialist containers * Model loading verification * PVC, service, route checks * GPU node scheduling confirmation - Makefile: Add include for tools/make/openshift.mk - tools/make/openshift.mk: Optional make targets for OpenShift operations * openshift-deploy, openshift-cleanup, openshift-status * openshift-logs, openshift-routes, openshift-test * Port forwarding helpers 1. **GPU Support**: Full NVIDIA GPU support via CDI device injection 2. **Specialist Models**: Real llm-katan containers for math/coding tasks 3. **Zero-Touch Deployment**: Auto-detection of cluster, automatic builds 4. **Production Ready**: Production log levels, proper health checks 5. **Validation**: Comprehensive 12-test validation suite 6. **UX Enhancements**: OpenWebUI endpoint display, optional port forwarding 7. **Clean Separation**: Only touches deploy/openshift/ (plus minimal Makefile) ``` Pod: semantic-router ├── semantic-router (main ExtProc service, port 50051) ├── model-a (llm-katan math specialist, port 8000, GPU 0) ├── model-b (llm-katan coding specialist, port 8001, GPU 1) └── envoy-proxy (gateway, port 8801) ``` Validated on OpenShift with NVIDIA L4 GPUs: - All 4 containers running - GPUs detected in both specialist containers - Models loaded on CUDA - PVCs bound - Services and routes accessible - Streaming functionality working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

Routes are created without TLS termination by default, so URLs should use http:// not https://. This fixes the quick test commands shown at deployment completion. Tested and verified: - curl http://semantic-router-api.../health works - curl -X POST http://semantic-router-api.../api/v1/classify/intent works Signed-off-by: Yossi Ovadia <[email protected]>

github-actions · 2025-10-08T20:46:45Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/openshift/Dockerfile.llm-katan
deploy/openshift/README.md
deploy/openshift/cleanup-openshift.sh
deploy/openshift/config-openshift.yaml
deploy/openshift/deploy-to-openshift.sh
deploy/openshift/deployment.yaml
deploy/openshift/envoy-openshift.yaml
deploy/openshift/kustomization.yaml
deploy/openshift/template.yaml
deploy/openshift/validate-deployment.sh

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/openshift.mk

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

Makefile

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-08T20:52:19Z

deploy/openshift/Dockerfile.llm-katan

@@ -0,0 +1,45 @@
+# Optimized Dockerfile for llm-katan - OpenShift compatible


do you need this here? there is already a copy here https://github.com/vllm-project/semantic-router/blob/main/e2e-tests/llm-katan/Dockerfile

Copilot

Pull Request Overview

This PR adds comprehensive OpenShift deployment infrastructure for the semantic router with NVIDIA GPU support and specialized LLM containers. It provides zero-touch deployment automation with validation scripts and supports a 4-container pod architecture including semantic-router service, two GPU-enabled model specialists (model-a and model-b), and an envoy-proxy for HTTP gateway functionality.

Complete OpenShift deployment manifests with GPU scheduling and security contexts
Automated deployment and validation scripts with error handling and port forwarding
OpenShift-specific configurations for Envoy proxy and model routing

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tools/make/openshift.mk	Makefile targets for OpenShift operations including login, deploy, status, and debugging
deploy/openshift/validate-deployment.sh	Comprehensive validation script testing all 4 containers, GPU detection, and model loading
deploy/openshift/template.yaml	OpenShift template for parameterized deployment with security contexts
deploy/openshift/kustomization.yaml	Kustomize configuration for OpenShift deployment with labels and annotations
deploy/openshift/envoy-openshift.yaml	OpenShift-specific Envoy configuration using static clusters for pod networking
deploy/openshift/deployment.yaml	Main deployment manifest with 4-container pod, GPU scheduling, and init container
deploy/openshift/deploy-to-openshift.sh	Automated deployment script with login detection, build management, and port forwarding
deploy/openshift/config-openshift.yaml	OpenShift-specific router configuration with localhost endpoints and model policies
deploy/openshift/cleanup-openshift.sh	Comprehensive cleanup script with multiple cleanup levels and safety confirmations
deploy/openshift/README.md	Documentation for OpenShift deployment with troubleshooting and monitoring guides
deploy/openshift/Dockerfile.llm-katan	Dockerfile for building llm-katan specialist containers with CUDA support
Makefile	Include openshift.mk in the main Makefile

Comments suppressed due to low confidence (2)

deploy/openshift/deployment.yaml:1

Hardcoded storage class 'gp3-csi' may not be available on all OpenShift clusters. Consider making this configurable or using the cluster's default storage class.

apiVersion: apps/v1

deploy/openshift/deployment.yaml:1

Hardcoded storage class 'gp3-csi' may not be available on all OpenShift clusters. Consider making this configurable or using the cluster's default storage class.

apiVersion: apps/v1

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-08T20:52:56Z

tools/make/openshift.mk

+		echo "$(RED)[ERROR]$(NC) OPENSHIFT_PASSWORD is required"; \
+		exit 1; \
+	fi
+	@oc login -u $(OPENSHIFT_USER) -p $(OPENSHIFT_PASSWORD) $(OPENSHIFT_SERVER) --insecure-skip-tls-verify


Using --insecure-skip-tls-verify bypasses SSL certificate validation, which poses a security risk. Consider making this configurable or documenting the security implications.

Copilot · 2025-10-08T20:52:56Z

deploy/openshift/deploy-to-openshift.sh

+    fi
+
+    log "INFO" "Logging into OpenShift at $OPENSHIFT_SERVER as $OPENSHIFT_USER"
+    if ! oc login -u "$OPENSHIFT_USER" -p "$OPENSHIFT_PASSWORD" "$OPENSHIFT_SERVER" --insecure-skip-tls-verify; then


Using --insecure-skip-tls-verify bypasses SSL certificate validation. This should be configurable or at least documented as a security consideration for production deployments.

Copilot · 2025-10-08T20:52:57Z

deploy/openshift/cleanup-openshift.sh

+        return 0
+    fi
+
+    if ! oc login -u "$OPENSHIFT_USER" -p "$OPENSHIFT_PASSWORD" "$OPENSHIFT_SERVER" --insecure-skip-tls-verify; then


Using --insecure-skip-tls-verify bypasses SSL certificate validation. Consider making this configurable for production environments.

Copilot · 2025-10-08T20:52:57Z

deploy/openshift/deploy-to-openshift.sh

+        # Wait for python imagestream to be ready
+        log "INFO" "Waiting for python imagestream to be ready..."
+        sleep 5
+        while ! oc get istag python:3.10-slim -n "$NAMESPACE" &> /dev/null; do
+            sleep 2
+        done
+        log "SUCCESS" "Python imagestream ready"
+


The script waits for 'python:3.10-slim' imagestream tag but the Dockerfile uses 'python:3.10-slim' as a base image. OpenShift may not automatically create an imagestream for external images, potentially causing an infinite loop.

Suggested change

# Wait for python imagestream to be ready

log "INFO" "Waiting for python imagestream to be ready..."

sleep 5

while ! oc get istag python:3.10-slim -n "$NAMESPACE" &> /dev/null; do

sleep 2

done

log "SUCCESS" "Python imagestream ready"

Copilot · 2025-10-08T20:52:57Z

deploy/openshift/validate-deployment.sh

+# Test 4: Check GPU detection in model-a
+log "INFO" "Test 4: Checking GPU detection in model-a container..."
+if [[ -n "$POD_NAME" ]]; then
+    GPU_CHECK=$(oc exec -n "$NAMESPACE" "$POD_NAME" -c model-a -- python3 -c "import torch; print('CUDA' if torch.cuda.is_available() else 'CPU')" 2>/dev/null || echo "ERROR")


The GPU detection logic is duplicated for model-a and model-b containers. Consider extracting this into a reusable function to reduce code duplication.

Copilot · 2025-10-08T20:52:58Z

deploy/openshift/config-openshift.yaml

+  jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"
+
+# vLLM Endpoints Configuration
+# IMPORTANT: Using localhost since containers are in same pod


[nitpick] While the comments explain the localhost usage, consider adding more context about why this differs from the standard Kubernetes configuration and the pod networking implications.

Suggested change

# IMPORTANT: Using localhost since containers are in same pod

# IMPORTANT: Using localhost (127.0.0.1) here because all vLLM containers and the orchestrator are running within the same Kubernetes pod.

# In standard Kubernetes configurations, containers in different pods cannot communicate via localhost; they must use service names or pod IPs over the pod network.

# If you move any of these containers to separate pods, you must update the address fields below to use the appropriate service name or pod IP instead of 127.0.0.1.

Copilot · 2025-10-08T20:52:58Z

deploy/openshift/config-openshift.yaml

+# IMPORTANT: Using localhost since containers are in same pod
+vllm_endpoints:
+  - name: "model-a-endpoint"
+    address: "127.0.0.1"  # localhost in same pod
+    port: 8000
+    models:
+      - "Model-A"
+    weight: 1
+  - name: "model-b-endpoint"
+    address: "127.0.0.1"  # localhost in same pod


[nitpick] While the comments explain the localhost usage, consider adding more context about why this differs from the standard Kubernetes configuration and the pod networking implications.

Suggested change

# IMPORTANT: Using localhost since containers are in same pod

vllm_endpoints:

- name: "model-a-endpoint"

address: "127.0.0.1" # localhost in same pod

port: 8000

models:

- "Model-A"

weight: 1

- name: "model-b-endpoint"

address: "127.0.0.1" # localhost in same pod

# IMPORTANT: Using localhost (127.0.0.1) here because all containers are running in the same pod and thus share the same network namespace.

# This differs from standard Kubernetes practice, where services typically communicate over the pod network using service names or pod IPs.

# If these containers are ever split into separate pods, using localhost will break communication; in that case, update the address to use the appropriate service name or pod IP.

vllm_endpoints:

- name: "model-a-endpoint"

address: "127.0.0.1" # localhost; only works because containers share the pod's network namespace

port: 8000

models:

- "Model-A"

weight: 1

- name: "model-b-endpoint"

address: "127.0.0.1" # localhost; only works because containers share the pod's network namespace

Copilot · 2025-10-08T20:52:58Z

deploy/openshift/deployment.yaml

+            cpu: "2"
+      # Real LLM specialist containers using llm-katan
+      - name: model-a
+        image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest


The image reference is hardcoded with the namespace. This creates a dependency between the image registry path and the deployment namespace, making it less portable.

Suggested change

image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest

image: image-registry.openshift-image-registry.svc:5000/$(NAMESPACE)/llm-katan:latest

Copilot · 2025-10-08T20:52:58Z

deploy/openshift/deployment.yaml

+            cpu: "1"
+            nvidia.com/gpu: "1"
+      - name: model-b
+        image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest


The image reference is hardcoded with the namespace. This creates a dependency between the image registry path and the deployment namespace, making it less portable.

Suggested change

image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest

image: image-registry.openshift-image-registry.svc:5000/${NAMESPACE}/llm-katan:latest

yossiovadia · 2025-10-08T20:59:40Z

Yes, its gpu specific

…

On Wed, Oct 8, 2025, 1:52 PM Huamin Chen ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In deploy/openshift/Dockerfile.llm-katan <#376 (comment)> : > @@ -0,0 +1,45 @@ +# Optimized Dockerfile for llm-katan - OpenShift compatible do you need this here? there is already a copy here https://github.com/vllm-project/semantic-router/blob/main/e2e-tests/llm-katan/Dockerfile — Reply to this email directly, view it on GitHub <#376 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOECUANDGIWUYAP3N464KT3WV2RTAVCNFSM6AAAAACIVIKIL6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMJWGQ4DOMRYGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

JaredforReal · 2025-10-09T05:55:17Z

@yossiovadia I can help you set up o11y stack(Grafana+Prometheus) in OpenShift when you make it work.

yossiovadia · 2025-10-09T19:40:20Z

@JaredforReal - really appreciate ! i have already started ( finished actually ) to work on it .
It's likely require some enhancement & testing

yossiovadia requested review from rootfs and Xunzhuo as code owners October 8, 2025 20:43

yossiovadia force-pushed the openshift-deploy-clean branch from 22073a8 to 27750ad Compare October 8, 2025 20:44

yossiovadia requested review from yuezhu1 and wangchen615 as code owners October 8, 2025 20:44

yossiovadia and others added 2 commits October 8, 2025 13:46

yossiovadia force-pushed the openshift-deploy-clean branch from 27750ad to f938960 Compare October 8, 2025 20:46

github-actions bot assigned rootfs and Xunzhuo Oct 8, 2025

rootfs requested a review from Copilot October 8, 2025 20:50

rootfs reviewed Oct 8, 2025

View reviewed changes

Copilot AI reviewed Oct 8, 2025

View reviewed changes

rootfs merged commit ab92fac into vllm-project:main Oct 8, 2025
9 checks passed

		@@ -0,0 +1,45 @@
		# Optimized Dockerfile for llm-katan - OpenShift compatible

-# IMPORTANT: Using localhost since containers are in same pod
+# IMPORTANT: Using localhost (127.0.0.1) here because all vLLM containers and the orchestrator are running within the same Kubernetes pod.
+# In standard Kubernetes configurations, containers in different pods cannot communicate via localhost; they must use service names or pod IPs over the pod network.
+# If you move any of these containers to separate pods, you must update the address fields below to use the appropriate service name or pod IP instead of 127.0.0.1.

	image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest
	image: image-registry.openshift-image-registry.svc:5000/$(NAMESPACE)/llm-katan:latest

OpenShift Deployment with GPU Support #376

OpenShift Deployment with GPU Support #376

Uh oh!

Conversation

yossiovadia commented Oct 8, 2025

Uh oh!

netlify bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 8, 2025

👥 vLLM Semantic Team Notification

📁 deploy

📁 tools

📁 Root Directory

🎉 Thanks for your contributions!

Uh oh!

rootfs Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

yossiovadia commented Oct 8, 2025 via email

Uh oh!

Uh oh!

JaredforReal commented Oct 9, 2025

Uh oh!

yossiovadia commented Oct 9, 2025

Uh oh!

Uh oh!

netlify bot commented Oct 8, 2025 •

edited

Loading

📁 `deploy`

📁 `tools`

📁 `Root Directory`