From 79ccb8a05cd4f920f0133b4d1c36b8cbacb49b83 Mon Sep 17 00:00:00 2001 From: Yossi Ovadia Date: Wed, 8 Oct 2025 13:20:01 -0700 Subject: [PATCH 1/2] feat: add OpenShift deployment infrastructure with GPU support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit adds comprehensive OpenShift deployment support with GPU-enabled specialist model containers, providing a complete automation solution for deploying the semantic router to OpenShift clusters. **Core Deployment:** - deployment.yaml: Kubernetes deployment manifest with GPU support * 4-container pod: semantic-router, model-a, model-b, envoy-proxy * CDI annotations for GPU device injection (gpu=0, gpu=1) * GPU node selection and tolerations * PVC mounts for models and cache * Production log levels (INFO for containers, info for Envoy) - deploy-to-openshift.sh: Main deployment automation script (826 lines) * Auto-detection of OpenShift server and existing login * Enhanced deployment method with llm-katan specialists * Alternative methods: kustomize, template * Configurable resources, storage, logging * Automatic namespace creation * Inline Dockerfile build for llm-katan image * Service and route creation * Optional port forwarding (disabled by default) * Displays OpenWebUI endpoint at completion - cleanup-openshift.sh: Cleanup automation script (494 lines) * Auto-detection of cluster and namespace * Graceful cleanup with confirmation * Port forwarding cleanup * Comprehensive resource deletion **Configuration:** - config-openshift.yaml: Semantic router config for OpenShift * Math-specialist and coding-specialist endpoints * Category-to-specialist routing * PII and jailbreak detection configuration - envoy-openshift.yaml: Envoy proxy configuration * HTTP listener on port 8801 * External processing filter * Specialist model routing * /v1/models aggregation **Container Image:** - Dockerfile.llm-katan: GPU-enabled specialist container image * Python 3.10-slim base * PyTorch with CUDA 12.1 support * llm-katan, transformers, accelerate packages * HuggingFace caching configuration * Health check endpoint **Alternative Deployment Methods:** - kustomization.yaml: Kustomize deployment option - template.yaml: OpenShift template with parameters **Documentation & Validation:** - README.md: Comprehensive deployment documentation - validate-deployment.sh: 12-test validation script * Namespace, deployment, container readiness * GPU detection in both specialist containers * Model loading verification * PVC, service, route checks * GPU node scheduling confirmation - Makefile: Add include for tools/make/openshift.mk - tools/make/openshift.mk: Optional make targets for OpenShift operations * openshift-deploy, openshift-cleanup, openshift-status * openshift-logs, openshift-routes, openshift-test * Port forwarding helpers 1. **GPU Support**: Full NVIDIA GPU support via CDI device injection 2. **Specialist Models**: Real llm-katan containers for math/coding tasks 3. **Zero-Touch Deployment**: Auto-detection of cluster, automatic builds 4. **Production Ready**: Production log levels, proper health checks 5. **Validation**: Comprehensive 12-test validation suite 6. **UX Enhancements**: OpenWebUI endpoint display, optional port forwarding 7. **Clean Separation**: Only touches deploy/openshift/ (plus minimal Makefile) ``` Pod: semantic-router ├── semantic-router (main ExtProc service, port 50051) ├── model-a (llm-katan math specialist, port 8000, GPU 0) ├── model-b (llm-katan coding specialist, port 8001, GPU 1) └── envoy-proxy (gateway, port 8801) ``` Validated on OpenShift with NVIDIA L4 GPUs: - All 4 containers running - GPUs detected in both specialist containers - Models loaded on CUDA - PVCs bound - Services and routes accessible - Streaming functionality working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Signed-off-by: Yossi Ovadia --- Makefile | 1 + deploy/openshift/Dockerfile.llm-katan | 45 ++ deploy/openshift/README.md | 180 ++++++ deploy/openshift/cleanup-openshift.sh | 494 ++++++++++++++ deploy/openshift/config-openshift.yaml | 212 ++++++ deploy/openshift/deploy-to-openshift.sh | 826 ++++++++++++++++++++++++ deploy/openshift/deployment.yaml | 351 ++++++++++ deploy/openshift/envoy-openshift.yaml | 196 ++++++ deploy/openshift/kustomization.yaml | 36 ++ deploy/openshift/template.yaml | 361 +++++++++++ deploy/openshift/validate-deployment.sh | 183 ++++++ tools/make/openshift.mk | 230 +++++++ 12 files changed, 3115 insertions(+) create mode 100644 deploy/openshift/Dockerfile.llm-katan create mode 100644 deploy/openshift/README.md create mode 100755 deploy/openshift/cleanup-openshift.sh create mode 100644 deploy/openshift/config-openshift.yaml create mode 100755 deploy/openshift/deploy-to-openshift.sh create mode 100644 deploy/openshift/deployment.yaml create mode 100644 deploy/openshift/envoy-openshift.yaml create mode 100644 deploy/openshift/kustomization.yaml create mode 100644 deploy/openshift/template.yaml create mode 100755 deploy/openshift/validate-deployment.sh create mode 100644 tools/make/openshift.mk diff --git a/Makefile b/Makefile index 2b1efa57..7002da4b 100644 --- a/Makefile +++ b/Makefile @@ -17,6 +17,7 @@ _run: -f tools/make/docker.mk \ -f tools/make/kube.mk \ -f tools/make/observability.mk \ + -f tools/make/openshift.mk \ $(MAKECMDGOALS) .PHONY: _run diff --git a/deploy/openshift/Dockerfile.llm-katan b/deploy/openshift/Dockerfile.llm-katan new file mode 100644 index 00000000..0ca53782 --- /dev/null +++ b/deploy/openshift/Dockerfile.llm-katan @@ -0,0 +1,45 @@ +# Optimized Dockerfile for llm-katan - OpenShift compatible +FROM python:3.10-slim + +# Install minimal system dependencies +RUN apt-get update && \ + apt-get install -y --no-install-recommends \ + git \ + curl \ + && rm -rf /var/lib/apt/lists/* + +# Set working directory +WORKDIR /app + +# Install PyTorch with CUDA 12.1 support (compatible with CUDA 12.x drivers) +RUN pip install --no-cache-dir \ + torch torchvision --index-url https://download.pytorch.org/whl/cu121 + +# Install llm-katan and its dependencies +RUN pip install --no-cache-dir \ + llm-katan \ + transformers \ + accelerate \ + fastapi \ + uvicorn \ + click \ + pydantic \ + numpy + +# Set environment variables for caching +ENV HF_HUB_CACHE=/tmp/hf_cache +ENV TRANSFORMERS_CACHE=/tmp/transformers_cache +ENV HF_HOME=/tmp/hf_cache + +# Create cache directories +RUN mkdir -p /tmp/hf_cache /tmp/transformers_cache + +# Expose ports +EXPOSE 8000 8001 + +# Health check +HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ + CMD curl -f http://localhost:8000/health || exit 1 + +# Default command - this will be overridden by deployment args +CMD ["llm-katan", "--help"] \ No newline at end of file diff --git a/deploy/openshift/README.md b/deploy/openshift/README.md new file mode 100644 index 00000000..352369ff --- /dev/null +++ b/deploy/openshift/README.md @@ -0,0 +1,180 @@ +# OpenShift Deployment for Semantic Router + +This directory contains OpenShift-specific deployment manifests for the vLLM Semantic Router. + +## Quick Deployment + +### Prerequisites + +- OpenShift cluster access +- `oc` CLI tool configured and logged in +- Cluster admin privileges (or permissions to create namespaces and routes) + +### One-Command Deployment + +```bash +oc apply -k deploy/openshift/ +``` + +### Step-by-Step Deployment + +1. **Create namespace:** + + ```bash + oc apply -f deploy/openshift/namespace.yaml + ``` + +2. **Deploy core resources:** + + ```bash + oc apply -f deploy/openshift/pvc.yaml + oc apply -f deploy/openshift/deployment.yaml + oc apply -f deploy/openshift/service.yaml + ``` + +3. **Create external routes:** + + ```bash + oc apply -f deploy/openshift/routes.yaml + ``` + +## Accessing Services + +After deployment, the services will be accessible via OpenShift Routes: + +### Get Route URLs + +```bash +# Classification API (HTTP REST) +oc get route semantic-router-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}' + +# gRPC API +oc get route semantic-router-grpc -n vllm-semantic-router-system -o jsonpath='{.spec.host}' + +# Metrics +oc get route semantic-router-metrics -n vllm-semantic-router-system -o jsonpath='{.spec.host}' +``` + +### Example Usage + +```bash +# Get the API route +API_ROUTE=$(oc get route semantic-router-api -n vllm-semantic-router-system -o jsonpath='{.spec.host}') + +# Test health endpoint +curl https://$API_ROUTE/health + +# Test classification +curl -X POST https://$API_ROUTE/api/v1/classify/intent \ + -H "Content-Type: application/json" \ + -d '{"text": "What is machine learning?"}' +``` + +## Architecture Differences from Kubernetes + +### Security Context + +- Removed `runAsNonRoot: false` for OpenShift compatibility +- Enhanced security context with `capabilities.drop: ALL` and `seccompProfile` +- OpenShift automatically enforces non-root containers + +### Networking + +- Uses OpenShift Routes instead of port-forwarding for external access +- TLS termination handled by OpenShift router +- Automatic HTTPS certificates via OpenShift + +### Storage + +- Uses OpenShift's default storage class +- PVC automatically bound to available storage + +## Monitoring + +### Check Deployment Status + +```bash +# Check pods +oc get pods -n vllm-semantic-router-system + +# Check services +oc get services -n vllm-semantic-router-system + +# Check routes +oc get routes -n vllm-semantic-router-system + +# Check logs +oc logs -f deployment/semantic-router -n vllm-semantic-router-system +``` + +### Metrics + +Access Prometheus metrics via the metrics route: + +```bash +METRICS_ROUTE=$(oc get route semantic-router-metrics -n vllm-semantic-router-system -o jsonpath='{.spec.host}') +curl https://$METRICS_ROUTE/metrics +``` + +## Cleanup + +Remove all resources: + +```bash +oc delete -k deploy/openshift/ +``` + +Or remove individual components: + +```bash +oc delete -f deploy/openshift/routes.yaml +oc delete -f deploy/openshift/service.yaml +oc delete -f deploy/openshift/deployment.yaml +oc delete -f deploy/openshift/pvc.yaml +oc delete -f deploy/openshift/namespace.yaml +``` + +## Troubleshooting + +### Common Issues + +**1. Pod fails to start due to security context:** + +```bash +oc describe pod -l app=semantic-router -n vllm-semantic-router-system +``` + +**2. Storage issues:** + +```bash +oc get pvc -n vllm-semantic-router-system +oc describe pvc semantic-router-models -n vllm-semantic-router-system +``` + +**3. Route not accessible:** + +```bash +oc get routes -n vllm-semantic-router-system +oc describe route semantic-router-api -n vllm-semantic-router-system +``` + +### Resource Requirements + +The deployment requires: + +- **Memory**: 3Gi request, 6Gi limit +- **CPU**: 1 core request, 2 cores limit +- **Storage**: 10Gi for model storage + +Adjust resource limits in `deployment.yaml` if needed for your cluster capacity. + +## Files Overview + +- `namespace.yaml` - Namespace with OpenShift-specific annotations +- `pvc.yaml` - Persistent volume claim for model storage +- `deployment.yaml` - Main application deployment with OpenShift security contexts +- `service.yaml` - Services for gRPC, HTTP API, and metrics +- `routes.yaml` - OpenShift routes for external access +- `config.yaml` - Application configuration +- `tools_db.json` - Tools database for semantic routing +- `kustomization.yaml` - Kustomize configuration for easy deployment diff --git a/deploy/openshift/cleanup-openshift.sh b/deploy/openshift/cleanup-openshift.sh new file mode 100755 index 00000000..809fc94c --- /dev/null +++ b/deploy/openshift/cleanup-openshift.sh @@ -0,0 +1,494 @@ +#!/bin/bash + +# cleanup-openshift.sh +# Comprehensive cleanup script for vLLM Semantic Router OpenShift deployment +# +# Usage: ./cleanup-openshift.sh [OPTIONS] +# +# This script provides complete cleanup capabilities for semantic router +# deployments on OpenShift, with support for different cleanup levels. + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Default configuration +OPENSHIFT_SERVER="" +OPENSHIFT_USER="admin" +OPENSHIFT_PASSWORD="" +NAMESPACE="vllm-semantic-router-system" +CLEANUP_LEVEL="namespace" # namespace, deployment, or all +DRY_RUN="false" +FORCE="false" +WAIT_FOR_COMPLETION="true" + +# Function to print colored output +log() { + local level=$1 + shift + local message="$@" + local timestamp=$(date '+%Y-%m-%d %H:%M:%S') + + case $level in + "INFO") echo -e "${timestamp} ${BLUE}[INFO]${NC} $message" ;; + "WARN") echo -e "${timestamp} ${YELLOW}[WARN]${NC} $message" ;; + "ERROR") echo -e "${timestamp} ${RED}[ERROR]${NC} $message" ;; + "SUCCESS") echo -e "${timestamp} ${GREEN}[SUCCESS]${NC} $message" ;; + esac +} + +# Function to show usage +usage() { + cat << EOF +Usage: $0 [OPTIONS] + +Cleanup script for vLLM Semantic Router OpenShift deployment + +OPTIONS: + -s, --server URL OpenShift API server URL + -u, --user USER OpenShift username (default: admin) + -p, --password PASS OpenShift password + -n, --namespace NS Deployment namespace (default: vllm-semantic-router-system) + -l, --level LEVEL Cleanup level: deployment|namespace|all (default: namespace) + -f, --force Force cleanup without confirmation + --no-wait Don't wait for cleanup completion + --dry-run Show what would be cleaned up without executing + -h, --help Show this help message + +CLEANUP LEVELS: + deployment - Remove deployment, services, routes, configmap (keep namespace and PVC) + namespace - Remove entire namespace and all resources (default) + all - Remove namespace and any cluster-wide resources + +EXAMPLES: + # Clean up entire namespace (default) + $0 -s https://api.cluster.example.com:6443 -p mypassword + + # Clean up only deployment resources, keep namespace + $0 -s https://api.cluster.example.com:6443 -p mypassword --level deployment + + # Dry run to see what would be cleaned up + $0 -s https://api.cluster.example.com:6443 -p mypassword --dry-run + + # Force cleanup without confirmation + $0 -s https://api.cluster.example.com:6443 -p mypassword --force + +ENVIRONMENT VARIABLES: + OPENSHIFT_SERVER OpenShift API server URL + OPENSHIFT_USER OpenShift username + OPENSHIFT_PASSWORD OpenShift password + SEMANTIC_ROUTER_NAMESPACE Deployment namespace + +EOF +} + +# Function to parse command line arguments +parse_args() { + while [[ $# -gt 0 ]]; do + case $1 in + -s|--server) + OPENSHIFT_SERVER="$2" + shift 2 + ;; + -u|--user) + OPENSHIFT_USER="$2" + shift 2 + ;; + -p|--password) + OPENSHIFT_PASSWORD="$2" + shift 2 + ;; + -n|--namespace) + NAMESPACE="$2" + shift 2 + ;; + -l|--level) + CLEANUP_LEVEL="$2" + shift 2 + ;; + -f|--force) + FORCE="true" + shift + ;; + --no-wait) + WAIT_FOR_COMPLETION="false" + shift + ;; + --dry-run) + DRY_RUN="true" + shift + ;; + -h|--help) + usage + exit 0 + ;; + *) + log "ERROR" "Unknown option: $1" + usage + exit 1 + ;; + esac + done + + # Override with environment variables if set + OPENSHIFT_SERVER="${OPENSHIFT_SERVER:-$OPENSHIFT_SERVER}" + OPENSHIFT_USER="${OPENSHIFT_USER:-$OPENSHIFT_USER}" + OPENSHIFT_PASSWORD="${OPENSHIFT_PASSWORD:-$OPENSHIFT_PASSWORD}" + NAMESPACE="${SEMANTIC_ROUTER_NAMESPACE:-$NAMESPACE}" +} + +# Function to validate prerequisites +validate_prerequisites() { + log "INFO" "Validating prerequisites..." + + # Check if oc is installed + if ! command -v oc &> /dev/null; then + log "ERROR" "OpenShift CLI (oc) is not installed or not in PATH" + exit 1 + fi + + # Check required parameters - server is only required if not already logged in + if [[ -z "$OPENSHIFT_SERVER" ]]; then + if oc whoami >/dev/null 2>&1; then + OPENSHIFT_SERVER=$(oc whoami --show-server 2>/dev/null || echo "") + log "INFO" "Auto-detected OpenShift server: $OPENSHIFT_SERVER" + else + log "ERROR" "Not logged in to OpenShift. Please login first using:" + log "INFO" " oc login " + log "INFO" "" + log "INFO" "Example:" + log "INFO" " oc login https://api.cluster.example.com:6443" + log "INFO" "" + log "INFO" "After logging in, simply run this script again without any arguments:" + log "INFO" " $0" + exit 1 + fi + fi + + # Password is only required if we need to login (not already logged in) + if [[ -z "$OPENSHIFT_PASSWORD" ]]; then + if oc whoami >/dev/null 2>&1; then + log "INFO" "No password specified, but already logged in as $(oc whoami)" + else + log "ERROR" "OpenShift password is required when not logged in. Use -p option or OPENSHIFT_PASSWORD env var" + log "ERROR" "Or login manually first with: oc login" + exit 1 + fi + fi + + # Validate cleanup level + if [[ "$CLEANUP_LEVEL" != "deployment" && "$CLEANUP_LEVEL" != "namespace" && "$CLEANUP_LEVEL" != "all" ]]; then + log "ERROR" "Invalid cleanup level: $CLEANUP_LEVEL. Must be 'deployment', 'namespace', or 'all'" + exit 1 + fi + + log "SUCCESS" "Prerequisites validated" +} + +# Function to login to OpenShift +login_openshift() { + log "INFO" "Logging into OpenShift at $OPENSHIFT_SERVER" + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would login with: oc login -u $OPENSHIFT_USER -p [REDACTED] $OPENSHIFT_SERVER --insecure-skip-tls-verify" + return 0 + fi + + if ! oc login -u "$OPENSHIFT_USER" -p "$OPENSHIFT_PASSWORD" "$OPENSHIFT_SERVER" --insecure-skip-tls-verify; then + log "ERROR" "Failed to login to OpenShift" + exit 1 + fi + + log "SUCCESS" "Successfully logged into OpenShift" +} + +# Function to check if namespace exists +check_namespace_exists() { + if [[ "$DRY_RUN" == "true" ]]; then + return 0 + fi + + if ! oc get namespace "$NAMESPACE" &> /dev/null; then + log "WARN" "Namespace $NAMESPACE does not exist" + return 1 + fi + return 0 +} + +# Function to show current resources +show_current_resources() { + log "INFO" "Current resources in namespace $NAMESPACE:" + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would show current resources" + return 0 + fi + + if ! check_namespace_exists; then + log "INFO" "No resources to show (namespace doesn't exist)" + return 0 + fi + + echo "" + echo "=== Pods ===" + oc get pods -n "$NAMESPACE" 2>/dev/null || echo "No pods found" + + echo "" + echo "=== Services ===" + oc get services -n "$NAMESPACE" 2>/dev/null || echo "No services found" + + echo "" + echo "=== Routes ===" + oc get routes -n "$NAMESPACE" 2>/dev/null || echo "No routes found" + + echo "" + echo "=== PVCs ===" + oc get pvc -n "$NAMESPACE" 2>/dev/null || echo "No PVCs found" + + echo "" + echo "=== ConfigMaps ===" + oc get configmaps -n "$NAMESPACE" 2>/dev/null || echo "No configmaps found" + + echo "" +} + +# Function to confirm cleanup +confirm_cleanup() { + if [[ "$FORCE" == "true" || "$DRY_RUN" == "true" ]]; then + return 0 + fi + + echo "" + log "WARN" "This will permanently delete resources!" + log "WARN" "Cleanup level: $CLEANUP_LEVEL" + log "WARN" "Namespace: $NAMESPACE" + + case "$CLEANUP_LEVEL" in + "deployment") + log "WARN" "Will delete: deployment, services, routes, configmaps (keeping namespace and PVCs)" + ;; + "namespace") + log "WARN" "Will delete: entire namespace and all resources including PVCs" + ;; + "all") + log "WARN" "Will delete: namespace and any cluster-wide resources" + ;; + esac + + echo "" + read -p "Are you sure you want to proceed? (yes/no): " confirm + if [[ "$confirm" != "yes" && "$confirm" != "y" ]]; then + log "INFO" "Cleanup cancelled by user" + exit 0 + fi +} + +# Function to cleanup deployment level resources +cleanup_deployment() { + log "INFO" "Cleaning up deployment-level resources..." + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would delete deployment resources in namespace $NAMESPACE" + return 0 + fi + + if ! check_namespace_exists; then + log "INFO" "Nothing to clean up (namespace doesn't exist)" + return 0 + fi + + # Delete specific resources but keep namespace and PVCs + local resources=( + "deployment/semantic-router" + "service/semantic-router" + "service/semantic-router-metrics" + "route/semantic-router-api" + "route/semantic-router-grpc" + "route/semantic-router-metrics" + "configmap/semantic-router-config" + ) + + for resource in "${resources[@]}"; do + if oc get "$resource" -n "$NAMESPACE" &> /dev/null; then + log "INFO" "Deleting $resource..." + oc delete "$resource" -n "$NAMESPACE" --ignore-not-found=true + else + log "INFO" "Resource $resource not found, skipping..." + fi + done + + log "SUCCESS" "Deployment-level cleanup completed" +} + +# Function to cleanup namespace +cleanup_namespace() { + log "INFO" "Cleaning up namespace: $NAMESPACE" + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would delete namespace: $NAMESPACE" + return 0 + fi + + if ! check_namespace_exists; then + log "INFO" "Nothing to clean up (namespace doesn't exist)" + return 0 + fi + + oc delete namespace "$NAMESPACE" --ignore-not-found=true + + if [[ "$WAIT_FOR_COMPLETION" == "true" ]]; then + log "INFO" "Waiting for namespace deletion to complete..." + local timeout=300 # 5 minutes + local count=0 + while oc get namespace "$NAMESPACE" &> /dev/null && [ $count -lt $timeout ]; do + sleep 2 + count=$((count + 2)) + if [ $((count % 30)) -eq 0 ]; then + log "INFO" "Still waiting for namespace deletion... (${count}s elapsed)" + fi + done + + if oc get namespace "$NAMESPACE" &> /dev/null; then + log "WARN" "Namespace deletion is taking longer than expected" + log "INFO" "You can check the status manually with: oc get namespace $NAMESPACE" + else + log "SUCCESS" "Namespace deleted successfully" + fi + fi +} + +# Function to cleanup cluster-wide resources (if any) +cleanup_cluster_wide() { + log "INFO" "Checking for cluster-wide resources to clean up..." + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would check for cluster-wide resources" + return 0 + fi + + # For semantic router, there typically aren't cluster-wide resources + # But this is where you would clean up CRDs, ClusterRoles, etc. if they existed + + log "INFO" "No cluster-wide resources to clean up for semantic router" +} + +# Function to cleanup port forwarding +cleanup_port_forwarding() { + log "INFO" "Cleaning up port forwarding processes..." + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would kill port forwarding processes for namespace: $NAMESPACE" + return 0 + fi + + # Kill any port forwarding processes for this namespace + local pf_pids=$(pgrep -f "oc port-forward.*$NAMESPACE" 2>/dev/null || true) + if [[ -n "$pf_pids" ]]; then + log "INFO" "Found port forwarding processes: $pf_pids" + pkill -f "oc port-forward.*$NAMESPACE" || true + sleep 2 + + # Verify they're gone + local remaining_pids=$(pgrep -f "oc port-forward.*$NAMESPACE" 2>/dev/null || true) + if [[ -z "$remaining_pids" ]]; then + log "SUCCESS" "Port forwarding processes terminated" + else + log "WARN" "Some port forwarding processes may still be running: $remaining_pids" + fi + else + log "INFO" "No port forwarding processes found for namespace $NAMESPACE" + fi + + # Clean up PID file if it exists + if [[ -f "/tmp/semantic-router-port-forward.pid" ]]; then + local saved_pid=$(cat /tmp/semantic-router-port-forward.pid 2>/dev/null | grep -o '^[0-9]*' || true) + if [[ -n "$saved_pid" ]]; then + log "INFO" "Cleaning up saved PID file (PID: $saved_pid)" + kill "$saved_pid" 2>/dev/null || true + fi + rm -f /tmp/semantic-router-port-forward.pid + log "INFO" "Removed PID file" + fi + + log "SUCCESS" "Port forwarding cleanup completed" +} + +# Function to verify cleanup completion +verify_cleanup() { + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would verify cleanup completion" + return 0 + fi + + log "INFO" "Verifying cleanup completion..." + + case "$CLEANUP_LEVEL" in + "deployment") + if check_namespace_exists; then + log "INFO" "Namespace $NAMESPACE still exists (as expected for deployment-level cleanup)" + local remaining_resources=$(oc get all -n "$NAMESPACE" --no-headers 2>/dev/null | wc -l) + if [[ "$remaining_resources" -eq 0 ]]; then + log "SUCCESS" "All deployment resources have been removed" + else + log "INFO" "Some resources remain in namespace (may include PVCs or other preserved resources)" + fi + else + log "WARN" "Namespace was also deleted (unexpected for deployment-level cleanup)" + fi + ;; + "namespace"|"all") + if check_namespace_exists; then + log "WARN" "Namespace $NAMESPACE still exists (deletion may still be in progress)" + else + log "SUCCESS" "Namespace $NAMESPACE has been completely removed" + fi + ;; + esac +} + +# Main function +main() { + log "INFO" "Starting vLLM Semantic Router OpenShift cleanup" + + parse_args "$@" + validate_prerequisites + login_openshift + + show_current_resources + confirm_cleanup + + # Clean up port forwarding first (before deleting resources) + cleanup_port_forwarding + + case "$CLEANUP_LEVEL" in + "deployment") + cleanup_deployment + ;; + "namespace") + cleanup_namespace + ;; + "all") + cleanup_namespace + cleanup_cluster_wide + ;; + esac + + verify_cleanup + log "SUCCESS" "Cleanup completed successfully!" + + if [[ "$CLEANUP_LEVEL" == "namespace" || "$CLEANUP_LEVEL" == "all" ]]; then + echo "" + log "INFO" "To redeploy the semantic router, simply run:" + log "INFO" " ./deploy-to-openshift.sh" + log "INFO" "" + log "INFO" "The deploy script will auto-detect your OpenShift server and use your existing login." + fi +} + +# Run main function with all arguments +main "$@" \ No newline at end of file diff --git a/deploy/openshift/config-openshift.yaml b/deploy/openshift/config-openshift.yaml new file mode 100644 index 00000000..df6ede53 --- /dev/null +++ b/deploy/openshift/config-openshift.yaml @@ -0,0 +1,212 @@ +bert_model: + model_id: sentence-transformers/all-MiniLM-L12-v2 + threshold: 0.6 + use_cpu: true + +semantic_cache: + enabled: true + backend_type: "memory" # Options: "memory" or "milvus" + similarity_threshold: 0.8 + max_entries: 1000 # Only applies to memory backend + ttl_seconds: 3600 + eviction_policy: "fifo" + +tools: + enabled: true + top_k: 3 + similarity_threshold: 0.2 + tools_db_path: "config/tools_db.json" + fallback_to_empty: true + +prompt_guard: + enabled: true + use_modernbert: true + model_id: "models/jailbreak_classifier_modernbert-base_model" + threshold: 0.7 + use_cpu: true + jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json" + +# vLLM Endpoints Configuration +# IMPORTANT: Using localhost since containers are in same pod +vllm_endpoints: + - name: "model-a-endpoint" + address: "127.0.0.1" # localhost in same pod + port: 8000 + models: + - "Model-A" + weight: 1 + - name: "model-b-endpoint" + address: "127.0.0.1" # localhost in same pod + port: 8001 + models: + - "Model-B" + weight: 1 + +model_config: + "Model-A": + reasoning_family: "qwen3" # This model uses Qwen reasoning syntax + preferred_endpoints: ["model-a-endpoint"] + pii_policy: + allow_by_default: false # Strict PII blocking model + pii_types_allowed: ["EMAIL_ADDRESS"] # Only allow emails + "Model-B": + reasoning_family: "qwen3" # This model uses Qwen reasoning syntax + preferred_endpoints: ["model-b-endpoint"] + pii_policy: + allow_by_default: true # Permissive PII model for safe routing + pii_types_allowed: ["EMAIL_ADDRESS", "PERSON", "GPE", "PHONE_NUMBER", "US_SSN", "CREDIT_CARD"] + +# Classifier configuration +classifier: + category_model: + model_id: "models/category_classifier_modernbert-base_model" + use_modernbert: true + threshold: 0.6 + use_cpu: true + category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json" + pii_model: + model_id: "models/pii_classifier_modernbert-base_presidio_token_model" + use_modernbert: true + threshold: 0.7 + use_cpu: true + pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json" + +# Categories with new use_reasoning field structure +categories: + - name: business + system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations." + model_scores: + - model: Model-B + score: 0.7 + use_reasoning: false # Business performs better without reasoning + - name: law + system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters." + model_scores: + - model: Model-B + score: 0.4 + use_reasoning: false + - name: psychology + system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice." + model_scores: + - model: Model-B + score: 0.6 + use_reasoning: false + - name: biology + system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems." + model_scores: + - model: Model-A + score: 0.9 + use_reasoning: false + - name: chemistry + system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations." + model_scores: + - model: Model-A + score: 0.6 + use_reasoning: true # Enable reasoning for complex chemistry + - name: history + system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis." + model_scores: + - model: Model-A + score: 0.7 + use_reasoning: false + - name: other + system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics." + model_scores: + - model: Model-A + score: 0.7 + use_reasoning: false + - name: health + system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies." + model_scores: + - model: Model-B + score: 0.5 + use_reasoning: false + - name: economics + system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses." + model_scores: + - model: Model-A + score: 1.0 + use_reasoning: false + - name: math + system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way." + model_scores: + - model: Model-A + score: 1.0 + use_reasoning: true # Enable reasoning for complex math + - name: physics + system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate." + model_scores: + - model: Model-A + score: 0.7 + use_reasoning: true # Enable reasoning for physics + - name: computer science + system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful." + model_scores: + - model: Model-A + score: 0.6 + use_reasoning: false + - name: philosophy + system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates." + model_scores: + - model: Model-B + score: 0.5 + use_reasoning: false + - name: engineering + system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards." + model_scores: + - model: Model-A + score: 0.7 + use_reasoning: false + +default_model: Model-A + +# Reasoning family configurations +reasoning_families: + deepseek: + type: "chat_template_kwargs" + parameter: "thinking" + + qwen3: + type: "chat_template_kwargs" + parameter: "enable_thinking" + + gpt-oss: + type: "reasoning_effort" + parameter: "reasoning_effort" + gpt: + type: "reasoning_effort" + parameter: "reasoning_effort" + +# Global default reasoning effort level +default_reasoning_effort: high + +# API Configuration +api: + batch_classification: + max_batch_size: 100 + concurrency_threshold: 5 + max_concurrency: 8 + metrics: + enabled: true + detailed_goroutine_tracking: true + high_resolution_timing: false + sample_rate: 1.0 + duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30] + size_buckets: [1, 2, 5, 10, 20, 50, 100, 200] + +# Observability Configuration +observability: + tracing: + enabled: false # Enable distributed tracing (default: false) + provider: "opentelemetry" # Provider: opentelemetry, openinference, openllmetry + exporter: + type: "stdout" # Exporter: otlp, jaeger, zipkin, stdout + endpoint: "localhost:4317" # OTLP endpoint (when type: otlp) + insecure: true # Use insecure connection (no TLS) + sampling: + type: "always_on" # Sampling: always_on, always_off, probabilistic + rate: 1.0 # Sampling rate for probabilistic (0.0-1.0) + resource: + service_name: "vllm-semantic-router" + service_version: "v0.1.0" + deployment_environment: "development" \ No newline at end of file diff --git a/deploy/openshift/deploy-to-openshift.sh b/deploy/openshift/deploy-to-openshift.sh new file mode 100755 index 00000000..cf8d8972 --- /dev/null +++ b/deploy/openshift/deploy-to-openshift.sh @@ -0,0 +1,826 @@ +#!/bin/bash + +# deploy-to-openshift.sh +# Automated deployment script for vLLM Semantic Router on OpenShift +# +# Usage: ./deploy-to-openshift.sh [OPTIONS] +# +# This script provides a complete automation solution for deploying +# the semantic router to OpenShift with support for different environments +# and configuration options. + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Default configuration +OPENSHIFT_SERVER="" +OPENSHIFT_USER="admin" +OPENSHIFT_PASSWORD="" +NAMESPACE="vllm-semantic-router-system" +DEPLOYMENT_METHOD="enhanced" # Use enhanced deployment with llm-katan specialists +CONTAINER_IMAGE="ghcr.io/vllm-project/semantic-router/extproc" +CONTAINER_TAG="latest" +STORAGE_SIZE="10Gi" +MEMORY_REQUEST="3Gi" +MEMORY_LIMIT="6Gi" +CPU_REQUEST="1" +CPU_LIMIT="2" +LOG_LEVEL="info" +SKIP_MODEL_DOWNLOAD="false" +WAIT_FOR_READY="true" +CLEANUP_FIRST="false" +DRY_RUN="false" +PORT_FORWARD="false" +PORT_FORWARD_PORTS="8080:8080 8000:8000 8001:8001 50051:50051 8801:8801 19000:19000" + +# Function to print colored output +log() { + local level=$1 + shift + local message="$@" + local timestamp=$(date '+%Y-%m-%d %H:%M:%S') + + case $level in + "INFO") echo -e "${timestamp} ${BLUE}[INFO]${NC} $message" ;; + "WARN") echo -e "${timestamp} ${YELLOW}[WARN]${NC} $message" ;; + "ERROR") echo -e "${timestamp} ${RED}[ERROR]${NC} $message" ;; + "SUCCESS") echo -e "${timestamp} ${GREEN}[SUCCESS]${NC} $message" ;; + esac +} + +# Function to show usage +usage() { + cat << EOF +Usage: $0 [OPTIONS] + +Automated deployment script for vLLM Semantic Router on OpenShift + +OPTIONS: + -s, --server URL OpenShift API server URL + -u, --user USER OpenShift username (default: admin) + -p, --password PASS OpenShift password + -n, --namespace NS Deployment namespace (default: vllm-semantic-router-system) + -m, --method METHOD Deployment method: kustomize|template|enhanced (default: enhanced) + -i, --image IMAGE Container image (default: ghcr.io/vllm-project/semantic-router/extproc) + -t, --tag TAG Container tag (default: latest) + --storage SIZE Storage size (default: 10Gi) + --memory-request SIZE Memory request (default: 3Gi) + --memory-limit SIZE Memory limit (default: 6Gi) + --cpu-request SIZE CPU request (default: 1) + --cpu-limit SIZE CPU limit (default: 2) + --log-level LEVEL Log level: debug|info|warn|error (default: info) + --skip-models Skip model download (for demo/testing) + --no-wait Don't wait for deployment to be ready + --cleanup Clean up existing deployment first + --dry-run Show what would be deployed without executing + --port-forward Set up port forwarding after successful deployment (default: enabled) + --no-port-forward Disable automatic port forwarding + --port-forward-ports PORTS Custom port mappings (default: "8080:8080 8000:8000 8001:8001") + -h, --help Show this help message + +EXAMPLES: + # Simple deployment (if already logged in with 'oc login') + $0 + + # Deploy with manual server specification + $0 -s https://api.cluster.example.com:6443 -p mypassword + + # Deploy with custom namespace and resources + $0 -n my-semantic-router --memory-limit 8Gi --cpu-limit 4 + + # Deploy using basic method instead of enhanced + $0 --method kustomize + + # Dry run to see what would be deployed + $0 --dry-run + + # Deploy without automatic port forwarding + $0 --no-port-forward + +ENVIRONMENT VARIABLES: + OPENSHIFT_SERVER OpenShift API server URL + OPENSHIFT_USER OpenShift username + OPENSHIFT_PASSWORD OpenShift password + SEMANTIC_ROUTER_NAMESPACE Deployment namespace + +EOF +} + +# Function to parse command line arguments +parse_args() { + while [[ $# -gt 0 ]]; do + case $1 in + -s|--server) + OPENSHIFT_SERVER="$2" + shift 2 + ;; + -u|--user) + OPENSHIFT_USER="$2" + shift 2 + ;; + -p|--password) + OPENSHIFT_PASSWORD="$2" + shift 2 + ;; + -n|--namespace) + NAMESPACE="$2" + shift 2 + ;; + -m|--method) + DEPLOYMENT_METHOD="$2" + shift 2 + ;; + -i|--image) + CONTAINER_IMAGE="$2" + shift 2 + ;; + -t|--tag) + CONTAINER_TAG="$2" + shift 2 + ;; + --storage) + STORAGE_SIZE="$2" + shift 2 + ;; + --memory-request) + MEMORY_REQUEST="$2" + shift 2 + ;; + --memory-limit) + MEMORY_LIMIT="$2" + shift 2 + ;; + --cpu-request) + CPU_REQUEST="$2" + shift 2 + ;; + --cpu-limit) + CPU_LIMIT="$2" + shift 2 + ;; + --log-level) + LOG_LEVEL="$2" + shift 2 + ;; + --skip-models) + SKIP_MODEL_DOWNLOAD="true" + shift + ;; + --no-wait) + WAIT_FOR_READY="false" + shift + ;; + --cleanup) + CLEANUP_FIRST="true" + shift + ;; + --dry-run) + DRY_RUN="true" + shift + ;; + --port-forward) + PORT_FORWARD="true" + shift + ;; + --no-port-forward) + PORT_FORWARD="false" + shift + ;; + --port-forward-ports) + PORT_FORWARD_PORTS="$2" + shift 2 + ;; + -h|--help) + usage + exit 0 + ;; + *) + log "ERROR" "Unknown option: $1" + usage + exit 1 + ;; + esac + done + + # Override with environment variables if set + OPENSHIFT_SERVER="${OPENSHIFT_SERVER:-$OPENSHIFT_SERVER}" + OPENSHIFT_USER="${OPENSHIFT_USER:-$OPENSHIFT_USER}" + OPENSHIFT_PASSWORD="${OPENSHIFT_PASSWORD:-$OPENSHIFT_PASSWORD}" + NAMESPACE="${SEMANTIC_ROUTER_NAMESPACE:-$NAMESPACE}" +} + +# Function to validate prerequisites +validate_prerequisites() { + log "INFO" "Validating prerequisites..." + + # Check if oc is installed + if ! command -v oc &> /dev/null; then + log "ERROR" "OpenShift CLI (oc) is not installed or not in PATH" + log "INFO" "Install from: https://docs.openshift.com/container-platform/latest/cli_reference/openshift_cli/getting-started-cli.html" + exit 1 + fi + + # Check required parameters - server is only required if not already logged in + if [[ -z "$OPENSHIFT_SERVER" ]]; then + if oc whoami >/dev/null 2>&1; then + OPENSHIFT_SERVER=$(oc whoami --show-server 2>/dev/null || echo "") + log "INFO" "Auto-detected OpenShift server: $OPENSHIFT_SERVER" + else + log "ERROR" "Not logged in to OpenShift. Please login first using:" + log "INFO" " oc login " + log "INFO" "" + log "INFO" "Example:" + log "INFO" " oc login https://api.cluster.example.com:6443" + log "INFO" "" + log "INFO" "After logging in, simply run this script again without any arguments:" + log "INFO" " $0" + exit 1 + fi + fi + + # Password is only required if we need to login (not already logged in) + if [[ -z "$OPENSHIFT_PASSWORD" ]]; then + if oc whoami >/dev/null 2>&1; then + log "INFO" "No password specified, but already logged in as $(oc whoami)" + else + log "ERROR" "OpenShift password is required when not logged in. Use -p option or OPENSHIFT_PASSWORD env var" + log "ERROR" "Or login manually first with: oc login" + exit 1 + fi + fi + + # Validate deployment method + if [[ "$DEPLOYMENT_METHOD" != "kustomize" && "$DEPLOYMENT_METHOD" != "template" && "$DEPLOYMENT_METHOD" != "enhanced" ]]; then + log "ERROR" "Invalid deployment method: $DEPLOYMENT_METHOD. Must be 'kustomize', 'template', or 'enhanced'" + exit 1 + fi + + log "SUCCESS" "Prerequisites validated" +} + +# Function to login to OpenShift +login_openshift() { + log "INFO" "Checking OpenShift login status..." + + # Check if already logged in + if oc whoami >/dev/null 2>&1; then + local current_user=$(oc whoami) + local current_server=$(oc whoami --show-server 2>/dev/null || echo "unknown") + log "SUCCESS" "Already logged in as '$current_user' to '$current_server'" + + # If server matches what we want, we're good + if [[ -n "$OPENSHIFT_SERVER" && "$current_server" == "$OPENSHIFT_SERVER" ]]; then + log "INFO" "Current session matches target server, continuing..." + return 0 + elif [[ -z "$OPENSHIFT_SERVER" ]]; then + log "INFO" "No server specified, using current session..." + return 0 + else + log "WARN" "Current server '$current_server' differs from target '$OPENSHIFT_SERVER'" + log "INFO" "Will login to target server..." + fi + else + log "INFO" "Not currently logged in to OpenShift" + fi + + # Need to login + if [[ -z "$OPENSHIFT_SERVER" ]]; then + log "ERROR" "No OpenShift server specified and not currently logged in" + log "ERROR" "Please specify server with -s option or login manually with:" + log "ERROR" " oc login https://your-openshift-server:6443" + exit 1 + fi + + if [[ -z "$OPENSHIFT_PASSWORD" ]]; then + log "ERROR" "No OpenShift password specified" + log "ERROR" "Please specify password with -p option or login manually with:" + log "ERROR" " oc login -u $OPENSHIFT_USER $OPENSHIFT_SERVER" + exit 1 + fi + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would login with: oc login -u $OPENSHIFT_USER -p [REDACTED] $OPENSHIFT_SERVER --insecure-skip-tls-verify" + return 0 + fi + + log "INFO" "Logging into OpenShift at $OPENSHIFT_SERVER as $OPENSHIFT_USER" + if ! oc login -u "$OPENSHIFT_USER" -p "$OPENSHIFT_PASSWORD" "$OPENSHIFT_SERVER" --insecure-skip-tls-verify; then + log "ERROR" "Failed to login to OpenShift" + log "ERROR" "Please check your credentials and try again, or login manually with:" + log "ERROR" " oc login -u $OPENSHIFT_USER $OPENSHIFT_SERVER" + exit 1 + fi + + log "SUCCESS" "Successfully logged into OpenShift as $(oc whoami)" +} + +# Function to cleanup existing deployment +cleanup_deployment() { + if [[ "$CLEANUP_FIRST" == "true" ]]; then + log "INFO" "Cleaning up existing deployment in namespace $NAMESPACE" + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would delete namespace: $NAMESPACE" + return 0 + fi + + if oc get namespace "$NAMESPACE" &> /dev/null; then + oc delete namespace "$NAMESPACE" --ignore-not-found=true + log "INFO" "Waiting for namespace deletion to complete..." + while oc get namespace "$NAMESPACE" &> /dev/null; do + sleep 2 + done + log "SUCCESS" "Namespace $NAMESPACE deleted" + else + log "INFO" "Namespace $NAMESPACE does not exist, skipping cleanup" + fi + fi +} + +# Function to deploy using Enhanced OpenShift deployment (with llm-katan specialists) +deploy_enhanced() { + log "INFO" "Deploying using Enhanced OpenShift method (with llm-katan specialists)" + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would deploy enhanced deployment with 4-container pod:" + log "INFO" "[DRY RUN] - semantic-router (main ExtProc service on port 50051)" + log "INFO" "[DRY RUN] - math-specialist (llm-katan on port 8000)" + log "INFO" "[DRY RUN] - coding-specialist (llm-katan on port 8001)" + log "INFO" "[DRY RUN] - envoy-proxy (gateway on port 8801)" + return 0 + fi + + local script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + + # Create namespace first + oc create namespace "$NAMESPACE" --dry-run=client -o yaml | oc apply -f - + log "SUCCESS" "Namespace $NAMESPACE created/verified" + + # Build llm-katan image if it doesn't exist + log "INFO" "Checking for llm-katan image..." + if ! oc get imagestream llm-katan -n "$NAMESPACE" &> /dev/null; then + log "INFO" "Building llm-katan image from Dockerfile..." + + # Create build config and start build + if [[ -f "$script_dir/Dockerfile.llm-katan" ]]; then + oc new-build --dockerfile - --name llm-katan -n "$NAMESPACE" < "$script_dir/Dockerfile.llm-katan" + else + log "ERROR" "Dockerfile.llm-katan not found. Expected at: $script_dir/Dockerfile.llm-katan" + exit 1 + fi + + # Wait for python imagestream to be ready + log "INFO" "Waiting for python imagestream to be ready..." + sleep 5 + while ! oc get istag python:3.10-slim -n "$NAMESPACE" &> /dev/null; do + sleep 2 + done + log "SUCCESS" "Python imagestream ready" + + # Start the build and wait for completion + log "INFO" "Starting llm-katan build..." + oc start-build llm-katan -n "$NAMESPACE" + + # Wait for build to complete + log "INFO" "Waiting for llm-katan build to complete..." + if ! oc wait --for=condition=Complete build/llm-katan-1 -n "$NAMESPACE" --timeout=600s; then + log "ERROR" "llm-katan build failed or timed out" + oc logs build/llm-katan-1 -n "$NAMESPACE" --tail=50 + exit 1 + fi + + log "SUCCESS" "llm-katan image built successfully" + else + log "INFO" "llm-katan image already exists, skipping build" + fi + + # Create PVC for models + log "INFO" "Creating PVC for models..." + cat < /dev/null + rm -rf "$temp_dir" + + log "SUCCESS" "Kustomize deployment applied successfully" +} + +# Function to deploy using template +deploy_template() { + log "INFO" "Deploying using OpenShift Template method" + + local script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + local template_file="$script_dir/template.yaml" + + if [[ ! -f "$template_file" ]]; then + log "ERROR" "Template file not found: $template_file" + exit 1 + fi + + local template_params=( + "NAMESPACE=$NAMESPACE" + "CONTAINER_IMAGE=$CONTAINER_IMAGE" + "CONTAINER_TAG=$CONTAINER_TAG" + "STORAGE_SIZE=$STORAGE_SIZE" + "MEMORY_REQUEST=$MEMORY_REQUEST" + "MEMORY_LIMIT=$MEMORY_LIMIT" + "CPU_REQUEST=$CPU_REQUEST" + "CPU_LIMIT=$CPU_LIMIT" + "LOG_LEVEL=$LOG_LEVEL" + ) + + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would process template with parameters:" + for param in "${template_params[@]}"; do + log "INFO" " $param" + done + return 0 + fi + + # Process and apply template + local param_args="" + for param in "${template_params[@]}"; do + param_args="$param_args -p $param" + done + + if ! oc process -f "$template_file" $param_args | oc apply -f -; then + log "ERROR" "Failed to process and apply template" + exit 1 + fi + + log "SUCCESS" "Template deployment applied successfully" +} + +# Function to wait for deployment to be ready +wait_for_ready() { + if [[ "$WAIT_FOR_READY" == "false" || "$DRY_RUN" == "true" ]]; then + return 0 + fi + + log "INFO" "Waiting for deployment to be ready..." + + # Wait for namespace to be active + log "INFO" "Waiting for namespace $NAMESPACE to be active..." + if ! oc wait --for=condition=Active namespace/"$NAMESPACE" --timeout=60s; then + log "WARN" "Namespace did not become active within timeout, but continuing..." + fi + + # Wait for deployment to be available + log "INFO" "Waiting for deployment to be available..." + if ! oc wait --for=condition=Available deployment/semantic-router -n "$NAMESPACE" --timeout=300s; then + log "WARN" "Deployment did not become available within timeout" + log "INFO" "Checking deployment status..." + oc get pods -n "$NAMESPACE" + oc describe deployment/semantic-router -n "$NAMESPACE" | tail -20 + return 1 + fi + + log "SUCCESS" "Deployment is ready!" +} + +# Function to setup port forwarding +setup_port_forwarding() { + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would setup port forwarding with ports: $PORT_FORWARD_PORTS" + return 0 + fi + + if [[ "$PORT_FORWARD" != "true" ]]; then + return 0 + fi + + log "INFO" "Setting up port forwarding..." + + # Get the pod name + local pod_name=$(oc get pods -n "$NAMESPACE" -l app=semantic-router -o jsonpath='{.items[0].metadata.name}') + if [[ -z "$pod_name" ]]; then + log "ERROR" "Could not find semantic-router pod for port forwarding" + return 1 + fi + + log "INFO" "Setting up port forwarding to pod: $pod_name" + + # Kill any existing port-forward processes for this namespace + pkill -f "oc port-forward.*$NAMESPACE" || true + sleep 2 + + # Set up port forwarding in background + log "INFO" "Port forwarding: $PORT_FORWARD_PORTS" + oc port-forward "$pod_name" $PORT_FORWARD_PORTS -n "$NAMESPACE" & + local pf_pid=$! + + # Give it a moment to establish + sleep 3 + + if kill -0 $pf_pid 2>/dev/null; then + log "SUCCESS" "Port forwarding established (PID: $pf_pid)" + log "INFO" "Access endpoints at:" + for port_mapping in $PORT_FORWARD_PORTS; do + local local_port=$(echo $port_mapping | cut -d: -f1) + log "INFO" " - localhost:$local_port" + done + log "INFO" "To stop port forwarding: kill $pf_pid" + echo "PID $pf_pid" > /tmp/semantic-router-port-forward.pid + else + log "WARN" "Port forwarding may have failed to establish" + fi +} + +# Function to show deployment info +show_deployment_info() { + if [[ "$DRY_RUN" == "true" ]]; then + log "INFO" "[DRY RUN] Would show deployment information" + return 0 + fi + + log "INFO" "Deployment information:" + + echo "" + echo "=== Pods ===" + oc get pods -n "$NAMESPACE" -o wide + + echo "" + echo "=== Services ===" + oc get services -n "$NAMESPACE" + + echo "" + echo "=== Routes ===" + oc get routes -n "$NAMESPACE" + + echo "" + echo "=== External URLs ===" + local api_route=$(oc get route semantic-router-api -n "$NAMESPACE" -o jsonpath='{.spec.host}' 2>/dev/null) + local grpc_route=$(oc get route semantic-router-grpc -n "$NAMESPACE" -o jsonpath='{.spec.host}' 2>/dev/null) + local metrics_route=$(oc get route semantic-router-metrics -n "$NAMESPACE" -o jsonpath='{.spec.host}' 2>/dev/null) + local envoy_route=$(oc get route envoy-http -n "$NAMESPACE" -o jsonpath='{.spec.host}' 2>/dev/null) + + if [[ -n "$envoy_route" ]]; then + echo "" + log "SUCCESS" "OpenWebUI Endpoint (use this in OpenWebUI settings):" + echo " http://$envoy_route/v1" + echo "" + fi + + if [[ -n "$api_route" ]]; then + echo "Classification API: https://$api_route" + echo "Health Check: https://$api_route/health" + fi + if [[ -n "$grpc_route" ]]; then + echo "gRPC API: https://$grpc_route" + fi + if [[ -n "$metrics_route" ]]; then + echo "Metrics: https://$metrics_route/metrics" + fi + + echo "" + echo "=== Quick Test Commands ===" + if [[ -n "$api_route" ]]; then + echo "curl -k https://$api_route/health" + echo "curl -k -X POST https://$api_route/api/v1/classify/intent -H 'Content-Type: application/json' -d '{\"text\": \"Hello world\"}'" + fi +} + +# Main function +main() { + log "INFO" "Starting vLLM Semantic Router OpenShift deployment" + + parse_args "$@" + validate_prerequisites + login_openshift + cleanup_deployment + + case "$DEPLOYMENT_METHOD" in + "kustomize") + deploy_kustomize + ;; + "template") + deploy_template + ;; + "enhanced") + deploy_enhanced + ;; + esac + + if wait_for_ready; then + show_deployment_info + setup_port_forwarding + log "SUCCESS" "Deployment completed successfully!" + else + log "ERROR" "Deployment may have issues. Check the logs and status above." + exit 1 + fi +} + +# Run main function with all arguments +main "$@" \ No newline at end of file diff --git a/deploy/openshift/deployment.yaml b/deploy/openshift/deployment.yaml new file mode 100644 index 00000000..4c1da75b --- /dev/null +++ b/deploy/openshift/deployment.yaml @@ -0,0 +1,351 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: semantic-router + namespace: vllm-semantic-router-system + labels: + app: semantic-router +spec: + replicas: 1 + selector: + matchLabels: + app: semantic-router + template: + metadata: + labels: + app: semantic-router + annotations: + cdi.k8s.io/model-a: k8s.device-plugin.nvidia.com/gpu=0 + cdi.k8s.io/model-b: k8s.device-plugin.nvidia.com/gpu=1 + spec: + # GPU node selection and toleration + nodeSelector: + nvidia.com/gpu.present: "true" + tolerations: + - key: nvidia.com/gpu + operator: Equal + value: "True" + effect: NoSchedule + initContainers: + - name: model-downloader + image: python:3.11-slim + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + command: ["/bin/bash", "-c"] + args: + - | + set -e + echo "Installing Hugging Face CLI..." + pip install --no-cache-dir huggingface_hub[cli] + + echo "Downloading models to persistent volume..." + cd /app/models + + # Download category classifier model with ALL files to see what's available + if [ ! -d "category_classifier_modernbert-base_model" ] || [ -z "$(find category_classifier_modernbert-base_model -name '*.safetensors' -o -name '*.bin' -o -name 'pytorch_model.*' 2>/dev/null)" ]; then + echo "Downloading category classifier model (all files)..." + huggingface-cli download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir category_classifier_modernbert-base_model --cache-dir /app/cache/hf + else + echo "Category classifier model already exists, skipping..." + fi + + # Download PII classifier model with ALL files + if [ ! -d "pii_classifier_modernbert-base_model" ] || [ -z "$(find pii_classifier_modernbert-base_model -name '*.safetensors' -o -name '*.bin' -o -name 'pytorch_model.*' 2>/dev/null)" ]; then + echo "Downloading PII classifier model (all files)..." + huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_model --local-dir pii_classifier_modernbert-base_model --cache-dir /app/cache/hf + else + echo "PII classifier model already exists, skipping..." + fi + + # Download jailbreak classifier model with ALL files + if [ ! -d "jailbreak_classifier_modernbert-base_model" ] || [ -z "$(find jailbreak_classifier_modernbert-base_model -name '*.safetensors' -o -name '*.bin' -o -name 'pytorch_model.*' 2>/dev/null)" ]; then + echo "Downloading jailbreak classifier model (all files)..." + huggingface-cli download LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model --local-dir jailbreak_classifier_modernbert-base_model --cache-dir /app/cache/hf + else + echo "Jailbreak classifier model already exists, skipping..." + fi + + # Download PII token classifier model with ALL files + if [ ! -d "pii_classifier_modernbert-base_presidio_token_model" ] || [ -z "$(find pii_classifier_modernbert-base_presidio_token_model -name '*.safetensors' -o -name '*.bin' -o -name 'pytorch_model.*' 2>/dev/null)" ]; then + echo "Downloading PII token classifier model (all files)..." + huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model --local-dir pii_classifier_modernbert-base_presidio_token_model --cache-dir /app/cache/hf + else + echo "PII token classifier model already exists, skipping..." + fi + + echo "All models downloaded successfully!" + ls -la /app/models/ + + echo "Setting proper permissions for models directory..." + # Make model files readable by group (OpenShift containers share the same group) + find /app/models -type f -exec chmod 644 {} \; || echo "Warning: Could not change model file permissions" + find /app/models -type d -exec chmod 755 {} \; || echo "Warning: Could not change model directory permissions" + + echo "Creating cache directories with proper permissions..." + mkdir -p /app/cache/hf /app/cache/transformers /app/cache/sentence_transformers /app/cache/xdg /app/cache/bert + chmod -R 777 /app/cache/ || echo "Warning: Could not change cache directory permissions (will rely on OpenShift defaults)" + + echo "Model download complete. Verifying directory structure..." + ls -la /app/models/ + ls -la /app/cache/ + env: + - name: HF_HUB_CACHE + value: /app/cache/hf + - name: HF_HOME + value: /app/cache/hf + - name: TRANSFORMERS_CACHE + value: /app/cache/transformers + - name: PIP_CACHE_DIR + value: /tmp/pip_cache + - name: PYTHONUSERBASE + value: /tmp/python_user + - name: PATH + value: /tmp/python_user/bin:/usr/local/bin:/usr/bin:/bin + # Reduced resource requirements for init container + resources: + requests: + memory: "512Mi" + cpu: "250m" + limits: + memory: "1Gi" + cpu: "500m" + volumeMounts: + - name: models-volume + mountPath: /app/models + - name: cache-volume + mountPath: /app/cache + containers: + - name: semantic-router + image: ghcr.io/vllm-project/semantic-router/extproc:latest + # No args - use insecure gRPC for localhost communication within pod + # Removed runAsNonRoot: false for OpenShift compatibility + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + ports: + - containerPort: 50051 + name: grpc + protocol: TCP + - containerPort: 9190 + name: metrics + protocol: TCP + - containerPort: 8080 + name: classify-api + protocol: TCP + env: + - name: LD_LIBRARY_PATH + value: "/app/lib" + - name: HF_HOME + value: "/app/cache/hf" + - name: TRANSFORMERS_CACHE + value: "/app/cache/transformers" + - name: SENTENCE_TRANSFORMERS_HOME + value: "/app/cache/sentence_transformers" + - name: XDG_CACHE_HOME + value: "/app/cache/xdg" + - name: HOME + value: "/tmp/home" + volumeMounts: + - name: config-volume + mountPath: /app/config + readOnly: true + - name: models-volume + mountPath: /app/models + - name: cache-volume + mountPath: /app/cache + livenessProbe: + tcpSocket: + port: 50051 + initialDelaySeconds: 60 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + readinessProbe: + tcpSocket: + port: 50051 + initialDelaySeconds: 90 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + # Resource requirements optimized for OpenShift + resources: + requests: + memory: "3Gi" + cpu: "1" + limits: + memory: "6Gi" + cpu: "2" + # Real LLM specialist containers using llm-katan + - name: model-a + image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest + ports: + - containerPort: 8000 + name: http + protocol: TCP + command: ["llm-katan"] + args: ["--model", "Qwen/Qwen3-0.6B", "--port", "8000", "--served-model-name", "Model-A", "--max-tokens", "512", "--temperature", "0.7", "--log-level", "INFO", "--device", "auto"] + env: + - name: HF_HUB_CACHE + value: "/app/cache/hf" + - name: TRANSFORMERS_CACHE + value: "/app/cache/transformers" + - name: HF_HOME + value: "/app/cache/hf" + - name: HOME + value: "/tmp/home" + volumeMounts: + - name: cache-volume + mountPath: /app/cache + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + livenessProbe: + httpGet: + path: / + port: 8000 + initialDelaySeconds: 90 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 5 + readinessProbe: + httpGet: + path: / + port: 8000 + initialDelaySeconds: 60 + periodSeconds: 15 + timeoutSeconds: 10 + failureThreshold: 5 + resources: + requests: + memory: "2Gi" + cpu: "500m" + nvidia.com/gpu: "1" + limits: + memory: "4Gi" + cpu: "1" + nvidia.com/gpu: "1" + - name: model-b + image: image-registry.openshift-image-registry.svc:5000/vllm-semantic-router-system/llm-katan:latest + ports: + - containerPort: 8001 + name: http + protocol: TCP + command: ["llm-katan"] + args: ["--model", "Qwen/Qwen3-0.6B", "--port", "8001", "--served-model-name", "Model-B", "--max-tokens", "512", "--temperature", "0.7", "--log-level", "INFO", "--device", "auto"] + env: + - name: HF_HUB_CACHE + value: "/app/cache/hf" + - name: TRANSFORMERS_CACHE + value: "/app/cache/transformers" + - name: HF_HOME + value: "/app/cache/hf" + - name: HOME + value: "/tmp/home" + volumeMounts: + - name: cache-volume + mountPath: /app/cache + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + livenessProbe: + httpGet: + path: / + port: 8001 + initialDelaySeconds: 90 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 5 + readinessProbe: + httpGet: + path: / + port: 8001 + initialDelaySeconds: 60 + periodSeconds: 15 + timeoutSeconds: 10 + failureThreshold: 5 + resources: + requests: + memory: "2Gi" + cpu: "500m" + nvidia.com/gpu: "1" + limits: + memory: "4Gi" + cpu: "1" + nvidia.com/gpu: "1" + # Envoy proxy container + - name: envoy-proxy + image: envoyproxy/envoy:v1.35.3 + ports: + - containerPort: 8801 + name: envoy-http + protocol: TCP + - containerPort: 19000 + name: envoy-admin + protocol: TCP + command: ["/usr/local/bin/envoy"] + args: ["-c", "/etc/envoy/envoy.yaml", "--component-log-level", "ext_proc:info,router:info,http:info"] + env: + - name: loglevel + value: "info" + volumeMounts: + - name: envoy-config-volume + mountPath: /etc/envoy + readOnly: true + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + livenessProbe: + tcpSocket: + port: 8801 + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + readinessProbe: + tcpSocket: + port: 8801 + initialDelaySeconds: 10 + periodSeconds: 15 + timeoutSeconds: 10 + failureThreshold: 3 + resources: + requests: + memory: "256Mi" + cpu: "250m" + limits: + memory: "512Mi" + cpu: "500m" + volumes: + - name: config-volume + configMap: + name: semantic-router-config + - name: envoy-config-volume + configMap: + name: envoy-config + - name: models-volume + persistentVolumeClaim: + claimName: semantic-router-models + - name: cache-volume + persistentVolumeClaim: + claimName: semantic-router-cache \ No newline at end of file diff --git a/deploy/openshift/envoy-openshift.yaml b/deploy/openshift/envoy-openshift.yaml new file mode 100644 index 00000000..d393c37a --- /dev/null +++ b/deploy/openshift/envoy-openshift.yaml @@ -0,0 +1,196 @@ +# OpenShift-specific Envoy configuration +# This config uses static clusters instead of ORIGINAL_DST to work with Kubernetes networking +# The main difference from config/envoy.yaml is the use of static clusters for math_specialist_cluster +# and coding_specialist_cluster that point to localhost ports 8000 and 8001 respectively. +static_resources: + listeners: + - name: listener_0 + address: + socket_address: + address: 0.0.0.0 + port_value: 8801 + filter_chains: + - filters: + - name: envoy.filters.network.http_connection_manager + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager + stat_prefix: ingress_http + access_log: + - name: envoy.access_loggers.stdout + typed_config: + "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog + log_format: + json_format: + time: "%START_TIME%" + protocol: "%PROTOCOL%" + request_method: "%REQ(:METHOD)%" + request_path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%" + response_code: "%RESPONSE_CODE%" + response_flags: "%RESPONSE_FLAGS%" + bytes_received: "%BYTES_RECEIVED%" + bytes_sent: "%BYTES_SENT%" + duration: "%DURATION%" + upstream_host: "%UPSTREAM_HOST%" + upstream_cluster: "%UPSTREAM_CLUSTER%" + upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%" + request_id: "%REQ(X-REQUEST-ID)%" + selected_model: "%REQ(X-SELECTED-MODEL)%" + selected_endpoint: "%REQ(X-GATEWAY-DESTINATION-ENDPOINT)%" + route_config: + name: local_route + virtual_hosts: + - name: local_service + domains: ["*"] + routes: + # Route /v1/models to semantic router for model aggregation + - match: + path: "/v1/models" + route: + cluster: semantic_router_cluster + timeout: 300s + # Route to Model-A for Model-A requests + - match: + prefix: "/" + headers: + - name: "x-gateway-destination-endpoint" + exact_match: "127.0.0.1:8000" + route: + cluster: model_a_cluster + timeout: 300s + # Route to Model-B for Model-B requests + - match: + prefix: "/" + headers: + - name: "x-gateway-destination-endpoint" + exact_match: "127.0.0.1:8001" + route: + cluster: model_b_cluster + timeout: 300s + # Default route to Model-B (fallback) + - match: + prefix: "/" + route: + cluster: model_b_cluster + timeout: 300s + http_filters: + - name: envoy.filters.http.ext_proc + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor + grpc_service: + envoy_grpc: + cluster_name: extproc_service + allow_mode_override: true + processing_mode: + request_header_mode: "SEND" + response_header_mode: "SEND" + request_body_mode: "BUFFERED" + response_body_mode: "BUFFERED" + request_trailer_mode: "SKIP" + response_trailer_mode: "SKIP" + failure_mode_allow: true + message_timeout: 300s + - name: envoy.filters.http.router + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router + suppress_envoy_headers: true + http2_protocol_options: + max_concurrent_streams: 100 + initial_stream_window_size: 65536 + initial_connection_window_size: 1048576 + stream_idle_timeout: "300s" + request_timeout: "300s" + common_http_protocol_options: + idle_timeout: "300s" + + clusters: + - name: extproc_service + connect_timeout: 300s + per_connection_buffer_limit_bytes: 52428800 + type: STATIC + lb_policy: ROUND_ROBIN + typed_extension_protocol_options: + envoy.extensions.upstreams.http.v3.HttpProtocolOptions: + "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions + explicit_http_config: + http2_protocol_options: + connection_keepalive: + interval: 300s + timeout: 300s + load_assignment: + cluster_name: extproc_service + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: 127.0.0.1 + port_value: 50051 + + # Static cluster for semantic router (OpenShift-specific) + - name: semantic_router_cluster + connect_timeout: 300s + per_connection_buffer_limit_bytes: 52428800 + type: STATIC + lb_policy: ROUND_ROBIN + load_assignment: + cluster_name: semantic_router_cluster + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: 127.0.0.1 + port_value: 8080 + typed_extension_protocol_options: + envoy.extensions.upstreams.http.v3.HttpProtocolOptions: + "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions + explicit_http_config: + http_protocol_options: {} + + # Static cluster for Model-A (OpenShift-specific) + - name: model_a_cluster + connect_timeout: 300s + per_connection_buffer_limit_bytes: 52428800 + type: STATIC + lb_policy: ROUND_ROBIN + load_assignment: + cluster_name: model_a_cluster + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: 127.0.0.1 + port_value: 8000 + typed_extension_protocol_options: + envoy.extensions.upstreams.http.v3.HttpProtocolOptions: + "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions + explicit_http_config: + http_protocol_options: {} + + # Static cluster for Model-B (OpenShift-specific) + - name: model_b_cluster + connect_timeout: 300s + per_connection_buffer_limit_bytes: 52428800 + type: STATIC + lb_policy: ROUND_ROBIN + load_assignment: + cluster_name: model_b_cluster + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: 127.0.0.1 + port_value: 8001 + typed_extension_protocol_options: + envoy.extensions.upstreams.http.v3.HttpProtocolOptions: + "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions + explicit_http_config: + http_protocol_options: {} + +admin: + address: + socket_address: + address: "127.0.0.1" + port_value: 19000 \ No newline at end of file diff --git a/deploy/openshift/kustomization.yaml b/deploy/openshift/kustomization.yaml new file mode 100644 index 00000000..686dff69 --- /dev/null +++ b/deploy/openshift/kustomization.yaml @@ -0,0 +1,36 @@ +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization + +metadata: + name: semantic-router-openshift + +resources: +- namespace.yaml +- pvc.yaml +- deployment.yaml +- service.yaml +- routes.yaml + +# Generate ConfigMap +configMapGenerator: +- name: semantic-router-config + files: + - config.yaml + - tools_db.json + +# Namespace for all resources +namespace: vllm-semantic-router-system + +images: +- name: ghcr.io/vllm-project/semantic-router/extproc + newTag: latest + +# Add OpenShift-specific labels +commonLabels: + app.kubernetes.io/name: semantic-router + app.kubernetes.io/instance: semantic-router-openshift + app.kubernetes.io/part-of: vllm-semantic-router + +# Add OpenShift-specific annotations +commonAnnotations: + deployment.openshift.io/type: "vllm-semantic-router" \ No newline at end of file diff --git a/deploy/openshift/template.yaml b/deploy/openshift/template.yaml new file mode 100644 index 00000000..c7f5212f --- /dev/null +++ b/deploy/openshift/template.yaml @@ -0,0 +1,361 @@ +apiVersion: template.openshift.io/v1 +kind: Template +metadata: + name: semantic-router + namespace: openshift + annotations: + description: "vLLM Semantic Router deployment template for OpenShift" + iconClass: "icon-load-balancer" + openshift.io/display-name: "vLLM Semantic Router" + openshift.io/documentation-url: "https://github.com/vllm-project/semantic-router" + openshift.io/support-url: "https://github.com/vllm-project/semantic-router/issues" + tags: "semantic-router,vllm,ai,ml,routing" + template.openshift.io/bindable: "false" +labels: + app: semantic-router + template: semantic-router +objects: +- apiVersion: v1 + kind: Namespace + metadata: + name: ${NAMESPACE} + labels: + name: ${NAMESPACE} + app.kubernetes.io/name: semantic-router + app.kubernetes.io/component: system + annotations: + openshift.io/description: "Namespace for vLLM Semantic Router deployment on OpenShift" + openshift.io/display-name: "vLLM Semantic Router System" +- apiVersion: v1 + kind: PersistentVolumeClaim + metadata: + name: semantic-router-models + namespace: ${NAMESPACE} + labels: + app: semantic-router + spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: ${STORAGE_SIZE} +- apiVersion: apps/v1 + kind: Deployment + metadata: + name: semantic-router + namespace: ${NAMESPACE} + labels: + app: semantic-router + spec: + replicas: ${{REPLICAS}} + selector: + matchLabels: + app: semantic-router + template: + metadata: + labels: + app: semantic-router + spec: + containers: + - name: semantic-router + image: ${CONTAINER_IMAGE}:${CONTAINER_TAG} + args: ["--secure=${SECURE_MODE}"] + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + ports: + - containerPort: 50051 + name: grpc + protocol: TCP + - containerPort: 9190 + name: metrics + protocol: TCP + - containerPort: 8080 + name: classify-api + protocol: TCP + env: + - name: LD_LIBRARY_PATH + value: "/app/lib" + volumeMounts: + - name: config-volume + mountPath: /app/config + readOnly: true + - name: models-volume + mountPath: /app/models + livenessProbe: + tcpSocket: + port: 50051 + initialDelaySeconds: ${LIVENESS_INITIAL_DELAY} + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + readinessProbe: + tcpSocket: + port: 50051 + initialDelaySeconds: ${READINESS_INITIAL_DELAY} + periodSeconds: 30 + timeoutSeconds: 10 + failureThreshold: 3 + resources: + requests: + memory: ${MEMORY_REQUEST} + cpu: ${CPU_REQUEST} + limits: + memory: ${MEMORY_LIMIT} + cpu: ${CPU_LIMIT} + volumes: + - name: config-volume + configMap: + name: semantic-router-config + - name: models-volume + persistentVolumeClaim: + claimName: semantic-router-models +- apiVersion: v1 + kind: Service + metadata: + name: semantic-router + namespace: ${NAMESPACE} + labels: + app: semantic-router + annotations: + service.alpha.openshift.io/serving-cert-secret-name: semantic-router-tls + spec: + type: ClusterIP + ports: + - port: 50051 + targetPort: grpc + protocol: TCP + name: grpc + - port: 8080 + targetPort: 8080 + protocol: TCP + name: classify-api + selector: + app: semantic-router +- apiVersion: v1 + kind: Service + metadata: + name: semantic-router-metrics + namespace: ${NAMESPACE} + labels: + app: semantic-router + service: metrics + spec: + type: ClusterIP + ports: + - port: 9190 + targetPort: metrics + protocol: TCP + name: metrics + selector: + app: semantic-router +- apiVersion: route.openshift.io/v1 + kind: Route + metadata: + name: semantic-router-api + namespace: ${NAMESPACE} + labels: + app: semantic-router + service: api + annotations: + description: "Route for Semantic Router Classification API" + spec: + host: ${API_HOSTNAME} + to: + kind: Service + name: semantic-router + port: + targetPort: classify-api + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect + wildcardPolicy: None +- apiVersion: route.openshift.io/v1 + kind: Route + metadata: + name: semantic-router-grpc + namespace: ${NAMESPACE} + labels: + app: semantic-router + service: grpc + annotations: + description: "Route for Semantic Router gRPC API" + haproxy.router.openshift.io/balance: roundrobin + haproxy.router.openshift.io/disable_cookies: "true" + spec: + host: ${GRPC_HOSTNAME} + to: + kind: Service + name: semantic-router + port: + targetPort: grpc + tls: + termination: passthrough + wildcardPolicy: None +- apiVersion: route.openshift.io/v1 + kind: Route + metadata: + name: semantic-router-metrics + namespace: ${NAMESPACE} + labels: + app: semantic-router + service: metrics + annotations: + description: "Route for Semantic Router Prometheus metrics" + spec: + host: ${METRICS_HOSTNAME} + to: + kind: Service + name: semantic-router-metrics + port: + targetPort: metrics + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect + wildcardPolicy: None +- apiVersion: v1 + kind: ConfigMap + metadata: + name: semantic-router-config + namespace: ${NAMESPACE} + labels: + app: semantic-router + data: + config.yaml: | + server: + port: 50051 + secure: ${SECURE_MODE} + + metrics: + enabled: true + port: 9190 + + classification: + api: + enabled: true + port: 8080 + + router: + category: + enabled: true + model_path: "models/category_classifier_modernbert-base_model" + + pii: + enabled: true + model_path: "models/pii_classifier_modernbert-base_model" + + jailbreak: + enabled: true + model_path: "models/jailbreak_classifier_modernbert-base_model" + + pii_token: + enabled: true + model_path: "models/pii_classifier_modernbert-base_presidio_token_model" + + tools: + enabled: true + database_path: "/app/config/tools_db.json" + + logging: + level: ${LOG_LEVEL} + format: "json" + + cache: + enabled: true + ttl: "5m" + tools_db.json: | + { + "tools": [ + { + "name": "search", + "description": "Search for information on the web", + "category": "search", + "parameters": { + "query": { + "type": "string", + "description": "The search query" + } + } + }, + { + "name": "calculator", + "description": "Perform mathematical calculations", + "category": "math", + "parameters": { + "expression": { + "type": "string", + "description": "The mathematical expression to evaluate" + } + } + } + ] + } +parameters: +- name: NAMESPACE + description: "The namespace to deploy the semantic router into" + value: "vllm-semantic-router-system" + required: true +- name: CONTAINER_IMAGE + description: "Container image for the semantic router" + value: "ghcr.io/vllm-project/semantic-router/extproc" + required: true +- name: CONTAINER_TAG + description: "Container image tag" + value: "latest" + required: true +- name: REPLICAS + description: "Number of replicas to deploy" + value: "1" + required: true +- name: STORAGE_SIZE + description: "Size of persistent storage for models" + value: "10Gi" + required: true +- name: MEMORY_REQUEST + description: "Memory request for container" + value: "3Gi" + required: true +- name: MEMORY_LIMIT + description: "Memory limit for container" + value: "6Gi" + required: true +- name: CPU_REQUEST + description: "CPU request for container" + value: "1" + required: true +- name: CPU_LIMIT + description: "CPU limit for container" + value: "2" + required: true +- name: SECURE_MODE + description: "Enable secure mode (TLS)" + value: "true" + required: true +- name: LIVENESS_INITIAL_DELAY + description: "Initial delay for liveness probe (seconds)" + value: "60" + required: true +- name: READINESS_INITIAL_DELAY + description: "Initial delay for readiness probe (seconds)" + value: "90" + required: true +- name: LOG_LEVEL + description: "Logging level (debug, info, warn, error)" + value: "info" + required: true +- name: API_HOSTNAME + description: "Custom hostname for API route (leave empty for auto-generated)" + value: "" + required: false +- name: GRPC_HOSTNAME + description: "Custom hostname for gRPC route (leave empty for auto-generated)" + value: "" + required: false +- name: METRICS_HOSTNAME + description: "Custom hostname for metrics route (leave empty for auto-generated)" + value: "" + required: false \ No newline at end of file diff --git a/deploy/openshift/validate-deployment.sh b/deploy/openshift/validate-deployment.sh new file mode 100755 index 00000000..234bbc0b --- /dev/null +++ b/deploy/openshift/validate-deployment.sh @@ -0,0 +1,183 @@ +#!/bin/bash + +# validate-deployment.sh +# Validates that the OpenShift deployment is working correctly + +set -e + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +NAMESPACE="${1:-vllm-semantic-router-system}" +FAILURES=0 + +log() { + local level=$1 + shift + local message="$@" + case $level in + "INFO") echo -e "${BLUE}[INFO]${NC} $message" ;; + "PASS") echo -e "${GREEN}[PASS]${NC} $message" ;; + "FAIL") echo -e "${RED}[FAIL]${NC} $message"; ((FAILURES++)) ;; + "WARN") echo -e "${YELLOW}[WARN]${NC} $message" ;; + esac +} + +# Test 1: Check namespace exists +log "INFO" "Test 1: Checking namespace $NAMESPACE exists..." +if oc get namespace "$NAMESPACE" &> /dev/null; then + log "PASS" "Namespace $NAMESPACE exists" +else + log "FAIL" "Namespace $NAMESPACE does not exist" +fi + +# Test 2: Check deployment is ready +log "INFO" "Test 2: Checking deployment is ready..." +if oc get deployment semantic-router -n "$NAMESPACE" &> /dev/null; then + READY=$(oc get deployment semantic-router -n "$NAMESPACE" -o jsonpath='{.status.readyReplicas}') + DESIRED=$(oc get deployment semantic-router -n "$NAMESPACE" -o jsonpath='{.spec.replicas}') + if [[ "$READY" == "$DESIRED" && "$READY" != "0" ]]; then + log "PASS" "Deployment is ready ($READY/$DESIRED replicas)" + else + log "FAIL" "Deployment not ready ($READY/$DESIRED replicas)" + fi +else + log "FAIL" "Deployment semantic-router does not exist" +fi + +# Test 3: Check all 4 containers are running +log "INFO" "Test 3: Checking all 4 containers are running..." +POD_NAME=$(oc get pods -n "$NAMESPACE" -l app=semantic-router -o jsonpath='{.items[0].metadata.name}' 2>/dev/null) +if [[ -n "$POD_NAME" ]]; then + CONTAINER_COUNT=$(oc get pod "$POD_NAME" -n "$NAMESPACE" -o json | jq '.status.containerStatuses | length') + READY_COUNT=$(oc get pod "$POD_NAME" -n "$NAMESPACE" -o json | jq '[.status.containerStatuses[] | select(.ready==true)] | length') + if [[ "$CONTAINER_COUNT" == "4" && "$READY_COUNT" == "4" ]]; then + log "PASS" "All 4 containers are running (semantic-router, model-a, model-b, envoy-proxy)" + else + log "FAIL" "Not all containers ready ($READY_COUNT/4 containers)" + fi +else + log "FAIL" "Could not find pod" +fi + +# Test 4: Check GPU detection in model-a +log "INFO" "Test 4: Checking GPU detection in model-a container..." +if [[ -n "$POD_NAME" ]]; then + GPU_CHECK=$(oc exec -n "$NAMESPACE" "$POD_NAME" -c model-a -- python3 -c "import torch; print('CUDA' if torch.cuda.is_available() else 'CPU')" 2>/dev/null || echo "ERROR") + if [[ "$GPU_CHECK" == "CUDA" ]]; then + GPU_NAME=$(oc exec -n "$NAMESPACE" "$POD_NAME" -c model-a -- python3 -c "import torch; print(torch.cuda.get_device_name(0))" 2>/dev/null || echo "Unknown") + log "PASS" "GPU detected in model-a: $GPU_NAME" + elif [[ "$GPU_CHECK" == "CPU" ]]; then + log "WARN" "GPU not detected, running on CPU (acceptable with --device auto)" + else + log "FAIL" "Could not check GPU: $GPU_CHECK" + fi +fi + +# Test 5: Check GPU detection in model-b +log "INFO" "Test 5: Checking GPU detection in model-b container..." +if [[ -n "$POD_NAME" ]]; then + GPU_CHECK=$(oc exec -n "$NAMESPACE" "$POD_NAME" -c model-b -- python3 -c "import torch; print('CUDA' if torch.cuda.is_available() else 'CPU')" 2>/dev/null || echo "ERROR") + if [[ "$GPU_CHECK" == "CUDA" ]]; then + GPU_NAME=$(oc exec -n "$NAMESPACE" "$POD_NAME" -c model-b -- python3 -c "import torch; print(torch.cuda.get_device_name(0))" 2>/dev/null || echo "Unknown") + log "PASS" "GPU detected in model-b: $GPU_NAME" + elif [[ "$GPU_CHECK" == "CPU" ]]; then + log "WARN" "GPU not detected, running on CPU (acceptable with --device auto)" + else + log "FAIL" "Could not check GPU: $GPU_CHECK" + fi +fi + +# Test 6: Check model-a loaded successfully +log "INFO" "Test 6: Checking model-a loaded successfully..." +if [[ -n "$POD_NAME" ]]; then + MODEL_STATUS=$(oc logs -n "$NAMESPACE" "$POD_NAME" -c model-a --tail=200 | grep -i "model loaded" | tail -1) + if [[ -n "$MODEL_STATUS" ]]; then + DEVICE=$(echo "$MODEL_STATUS" | grep -oE "(cuda|cpu)" || echo "unknown") + log "PASS" "Model-A loaded successfully on $DEVICE" + else + log "FAIL" "Could not verify model-a loaded" + fi +fi + +# Test 7: Check model-b loaded successfully +log "INFO" "Test 7: Checking model-b loaded successfully..." +if [[ -n "$POD_NAME" ]]; then + MODEL_STATUS=$(oc logs -n "$NAMESPACE" "$POD_NAME" -c model-b --tail=200 | grep -i "model loaded" | tail -1) + if [[ -n "$MODEL_STATUS" ]]; then + DEVICE=$(echo "$MODEL_STATUS" | grep -oE "(cuda|cpu)" || echo "unknown") + log "PASS" "Model-B loaded successfully on $DEVICE" + else + log "FAIL" "Could not verify model-b loaded" + fi +fi + +# Test 8: Check semantic-router is running +log "INFO" "Test 8: Checking semantic-router container is running..." +if [[ -n "$POD_NAME" ]]; then + SR_READY=$(oc get pod "$POD_NAME" -n "$NAMESPACE" -o json | jq -r '.status.containerStatuses[] | select(.name=="semantic-router") | .ready') + if [[ "$SR_READY" == "true" ]]; then + log "PASS" "Semantic-router container is ready and running" + else + log "FAIL" "Semantic-router container is not ready" + fi +fi + +# Test 9: Check PVCs are bound +log "INFO" "Test 9: Checking PVCs are bound..." +MODELS_PVC=$(oc get pvc semantic-router-models -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "Missing") +CACHE_PVC=$(oc get pvc semantic-router-cache -n "$NAMESPACE" -o jsonpath='{.status.phase}' 2>/dev/null || echo "Missing") +if [[ "$MODELS_PVC" == "Bound" ]]; then + log "PASS" "Models PVC is bound" +else + log "FAIL" "Models PVC is $MODELS_PVC" +fi +if [[ "$CACHE_PVC" == "Bound" ]]; then + log "PASS" "Cache PVC is bound" +else + log "FAIL" "Cache PVC is $CACHE_PVC" +fi + +# Test 10: Check services exist +log "INFO" "Test 10: Checking services exist..." +if oc get service semantic-router -n "$NAMESPACE" &> /dev/null; then + log "PASS" "Service semantic-router exists" +else + log "FAIL" "Service semantic-router missing" +fi + +# Test 11: Check routes exist +log "INFO" "Test 11: Checking routes exist..." +API_ROUTE=$(oc get route semantic-router-api -n "$NAMESPACE" -o jsonpath='{.spec.host}' 2>/dev/null || echo "") +if [[ -n "$API_ROUTE" ]]; then + log "PASS" "Route semantic-router-api exists: $API_ROUTE" +else + log "FAIL" "Route semantic-router-api missing" +fi + +# Test 12: Check GPU node scheduling +log "INFO" "Test 12: Checking pod scheduled on GPU node..." +if [[ -n "$POD_NAME" ]]; then + NODE_NAME=$(oc get pod "$POD_NAME" -n "$NAMESPACE" -o jsonpath='{.spec.nodeName}') + NODE_HAS_GPU=$(oc get node "$NODE_NAME" -o jsonpath='{.metadata.labels.nvidia\.com/gpu\.present}' 2>/dev/null || echo "false") + if [[ "$NODE_HAS_GPU" == "true" ]]; then + log "PASS" "Pod scheduled on GPU node: $NODE_NAME" + else + log "WARN" "Pod scheduled on non-GPU node: $NODE_NAME (acceptable if no GPU nodes available)" + fi +fi + +# Summary +echo "" +echo "================================" +if [[ $FAILURES -eq 0 ]]; then + log "PASS" "All validation tests passed!" + exit 0 +else + log "FAIL" "Validation completed with $FAILURES failure(s)" + exit 1 +fi diff --git a/tools/make/openshift.mk b/tools/make/openshift.mk new file mode 100644 index 00000000..7c8498b9 --- /dev/null +++ b/tools/make/openshift.mk @@ -0,0 +1,230 @@ +# OpenShift deployment targets for semantic-router +# This makefile provides commands for managing OpenShift deployments + +# Configuration +OPENSHIFT_SERVER ?= +OPENSHIFT_USER ?= admin +OPENSHIFT_PASSWORD ?= +OPENSHIFT_NAMESPACE ?= vllm-semantic-router-system +OPENSHIFT_DEPLOYMENT_METHOD ?= kustomize +OPENSHIFT_CONTAINER_IMAGE ?= ghcr.io/vllm-project/semantic-router/extproc +OPENSHIFT_CONTAINER_TAG ?= latest +OPENSHIFT_STORAGE_SIZE ?= 10Gi +OPENSHIFT_MEMORY_REQUEST ?= 3Gi +OPENSHIFT_MEMORY_LIMIT ?= 6Gi +OPENSHIFT_CPU_REQUEST ?= 1 +OPENSHIFT_CPU_LIMIT ?= 2 +OPENSHIFT_LOG_LEVEL ?= info + +# Colors for output +BLUE := \033[0;34m +GREEN := \033[0;32m +YELLOW := \033[1;33m +RED := \033[0;31m +NC := \033[0m # No Color + +.PHONY: openshift-login openshift-logout openshift-deploy openshift-undeploy openshift-status openshift-logs openshift-routes openshift-test + +# Login to OpenShift cluster +openshift-login: + @echo "$(BLUE)[INFO]$(NC) Logging into OpenShift cluster" + @if [ -z "$(OPENSHIFT_SERVER)" ]; then \ + echo "$(RED)[ERROR]$(NC) OPENSHIFT_SERVER is required"; \ + exit 1; \ + fi + @if [ -z "$(OPENSHIFT_PASSWORD)" ]; then \ + echo "$(RED)[ERROR]$(NC) OPENSHIFT_PASSWORD is required"; \ + exit 1; \ + fi + @oc login -u $(OPENSHIFT_USER) -p $(OPENSHIFT_PASSWORD) $(OPENSHIFT_SERVER) --insecure-skip-tls-verify + @echo "$(GREEN)[SUCCESS]$(NC) Logged into OpenShift cluster" + +# Logout from OpenShift cluster +openshift-logout: + @echo "$(BLUE)[INFO]$(NC) Logging out from OpenShift cluster" + @oc logout + @echo "$(GREEN)[SUCCESS]$(NC) Logged out from OpenShift cluster" + +# Deploy semantic-router to OpenShift using Kustomize +openshift-deploy: + @echo "$(BLUE)[INFO]$(NC) Deploying semantic-router to OpenShift namespace: $(OPENSHIFT_NAMESPACE)" + @echo "$(BLUE)[INFO]$(NC) Using image: $(OPENSHIFT_CONTAINER_IMAGE):$(OPENSHIFT_CONTAINER_TAG)" + @oc apply -k deploy/openshift/ + @echo "$(BLUE)[INFO]$(NC) Waiting for deployment to be ready..." + @oc wait --for=condition=Available deployment/semantic-router -n $(OPENSHIFT_NAMESPACE) --timeout=300s || true + @echo "$(GREEN)[SUCCESS]$(NC) Deployment completed!" + @$(MAKE) openshift-status + +# Deploy using automated script +openshift-deploy-auto: + @echo "$(BLUE)[INFO]$(NC) Running automated OpenShift deployment" + @if [ -z "$(OPENSHIFT_SERVER)" ] || [ -z "$(OPENSHIFT_PASSWORD)" ]; then \ + echo "$(RED)[ERROR]$(NC) OPENSHIFT_SERVER and OPENSHIFT_PASSWORD are required"; \ + echo "$(BLUE)[INFO]$(NC) Usage: make openshift-deploy-auto OPENSHIFT_SERVER=https://... OPENSHIFT_PASSWORD=..."; \ + exit 1; \ + fi + @./deploy/openshift/deploy-to-openshift.sh \ + --server "$(OPENSHIFT_SERVER)" \ + --user "$(OPENSHIFT_USER)" \ + --password "$(OPENSHIFT_PASSWORD)" \ + --namespace "$(OPENSHIFT_NAMESPACE)" \ + --method "$(OPENSHIFT_DEPLOYMENT_METHOD)" \ + --image "$(OPENSHIFT_CONTAINER_IMAGE)" \ + --tag "$(OPENSHIFT_CONTAINER_TAG)" \ + --storage "$(OPENSHIFT_STORAGE_SIZE)" \ + --memory-request "$(OPENSHIFT_MEMORY_REQUEST)" \ + --memory-limit "$(OPENSHIFT_MEMORY_LIMIT)" \ + --cpu-request "$(OPENSHIFT_CPU_REQUEST)" \ + --cpu-limit "$(OPENSHIFT_CPU_LIMIT)" \ + --log-level "$(OPENSHIFT_LOG_LEVEL)" \ + --skip-models + +# Deploy using OpenShift template +openshift-deploy-template: + @echo "$(BLUE)[INFO]$(NC) Deploying semantic-router using OpenShift template" + @oc process -f deploy/openshift/template.yaml \ + -p NAMESPACE=$(OPENSHIFT_NAMESPACE) \ + -p CONTAINER_IMAGE=$(OPENSHIFT_CONTAINER_IMAGE) \ + -p CONTAINER_TAG=$(OPENSHIFT_CONTAINER_TAG) \ + -p STORAGE_SIZE=$(OPENSHIFT_STORAGE_SIZE) \ + -p MEMORY_REQUEST=$(OPENSHIFT_MEMORY_REQUEST) \ + -p MEMORY_LIMIT=$(OPENSHIFT_MEMORY_LIMIT) \ + -p CPU_REQUEST=$(OPENSHIFT_CPU_REQUEST) \ + -p CPU_LIMIT=$(OPENSHIFT_CPU_LIMIT) \ + -p LOG_LEVEL=$(OPENSHIFT_LOG_LEVEL) \ + | oc apply -f - + @echo "$(GREEN)[SUCCESS]$(NC) Template deployment completed!" + @$(MAKE) openshift-status + +# Remove semantic-router from OpenShift +openshift-undeploy: + @echo "$(BLUE)[INFO]$(NC) Removing semantic-router from OpenShift" + @oc delete -k deploy/openshift/ --ignore-not-found=true + @echo "$(GREEN)[SUCCESS]$(NC) Undeployment completed" + +# Clean up everything including namespace +openshift-cleanup: + @echo "$(BLUE)[INFO]$(NC) Cleaning up namespace $(OPENSHIFT_NAMESPACE)" + @oc delete namespace $(OPENSHIFT_NAMESPACE) --ignore-not-found=true + @echo "$(GREEN)[SUCCESS]$(NC) Cleanup completed" + +# Show deployment status +openshift-status: + @echo "$(BLUE)[INFO]$(NC) OpenShift deployment status for namespace: $(OPENSHIFT_NAMESPACE)" + @echo "" + @echo "$(BLUE)=== Pods ===$(NC)" + @oc get pods -n $(OPENSHIFT_NAMESPACE) -o wide || echo "$(YELLOW)[WARN]$(NC) Cannot get pods" + @echo "" + @echo "$(BLUE)=== Services ===$(NC)" + @oc get services -n $(OPENSHIFT_NAMESPACE) || echo "$(YELLOW)[WARN]$(NC) Cannot get services" + @echo "" + @echo "$(BLUE)=== Routes ===$(NC)" + @oc get routes -n $(OPENSHIFT_NAMESPACE) || echo "$(YELLOW)[WARN]$(NC) Cannot get routes" + @echo "" + @echo "$(BLUE)=== PVCs ===$(NC)" + @oc get pvc -n $(OPENSHIFT_NAMESPACE) || echo "$(YELLOW)[WARN]$(NC) Cannot get PVCs" + +# Show logs +openshift-logs: + @echo "$(BLUE)[INFO]$(NC) Showing semantic-router logs" + @oc logs -n $(OPENSHIFT_NAMESPACE) -l app=semantic-router -f + +# Show logs from previous pod (for troubleshooting) +openshift-logs-previous: + @echo "$(BLUE)[INFO]$(NC) Showing previous semantic-router logs" + @oc logs -n $(OPENSHIFT_NAMESPACE) -l app=semantic-router --previous + +# Get route URLs +openshift-routes: + @echo "$(BLUE)[INFO]$(NC) OpenShift route URLs:" + @API_ROUTE=$$(oc get route semantic-router-api -n $(OPENSHIFT_NAMESPACE) -o jsonpath='{.spec.host}' 2>/dev/null); \ + GRPC_ROUTE=$$(oc get route semantic-router-grpc -n $(OPENSHIFT_NAMESPACE) -o jsonpath='{.spec.host}' 2>/dev/null); \ + METRICS_ROUTE=$$(oc get route semantic-router-metrics -n $(OPENSHIFT_NAMESPACE) -o jsonpath='{.spec.host}' 2>/dev/null); \ + echo ""; \ + if [ -n "$$API_ROUTE" ]; then \ + echo "$(GREEN)Classification API:$(NC) https://$$API_ROUTE"; \ + echo "$(GREEN)Health Check:$(NC) https://$$API_ROUTE/health"; \ + fi; \ + if [ -n "$$GRPC_ROUTE" ]; then \ + echo "$(GREEN)gRPC API:$(NC) https://$$GRPC_ROUTE"; \ + fi; \ + if [ -n "$$METRICS_ROUTE" ]; then \ + echo "$(GREEN)Metrics:$(NC) https://$$METRICS_ROUTE/metrics"; \ + fi; \ + echo "" + +# Test deployment connectivity +openshift-test: + @echo "$(BLUE)[INFO]$(NC) Testing OpenShift deployment connectivity" + @API_ROUTE=$$(oc get route semantic-router-api -n $(OPENSHIFT_NAMESPACE) -o jsonpath='{.spec.host}' 2>/dev/null); \ + if [ -n "$$API_ROUTE" ]; then \ + echo "$(BLUE)[INFO]$(NC) Testing API route: https://$$API_ROUTE"; \ + curl -k -f -m 10 "https://$$API_ROUTE/health" 2>/dev/null && \ + echo "$(GREEN)[SUCCESS]$(NC) API route is accessible" || \ + echo "$(YELLOW)[WARN]$(NC) API route test failed (may be expected if models not loaded)"; \ + else \ + echo "$(RED)[ERROR]$(NC) API route not found"; \ + fi + +# Port forward services (for testing from local machine) +openshift-port-forward-api: + @echo "$(BLUE)[INFO]$(NC) Port forwarding Classification API (8080)" + @echo "$(YELLOW)[INFO]$(NC) Access API at: http://localhost:8080" + @echo "$(YELLOW)[INFO]$(NC) Press Ctrl+C to stop port forwarding" + @oc port-forward -n $(OPENSHIFT_NAMESPACE) svc/semantic-router 8080:8080 + +openshift-port-forward-grpc: + @echo "$(BLUE)[INFO]$(NC) Port forwarding gRPC API (50051)" + @echo "$(YELLOW)[INFO]$(NC) Access gRPC API at: localhost:50051" + @echo "$(YELLOW)[INFO]$(NC) Press Ctrl+C to stop port forwarding" + @oc port-forward -n $(OPENSHIFT_NAMESPACE) svc/semantic-router 50051:50051 + +openshift-port-forward-metrics: + @echo "$(BLUE)[INFO]$(NC) Port forwarding Prometheus metrics (9190)" + @echo "$(YELLOW)[INFO]$(NC) Access metrics at: http://localhost:9190/metrics" + @echo "$(YELLOW)[INFO]$(NC) Press Ctrl+C to stop port forwarding" + @oc port-forward -n $(OPENSHIFT_NAMESPACE) svc/semantic-router-metrics 9190:9190 + +# Debugging targets +openshift-debug: + @echo "$(BLUE)[INFO]$(NC) OpenShift debugging information" + @echo "" + @echo "$(BLUE)=== Recent Events ===$(NC)" + @oc get events -n $(OPENSHIFT_NAMESPACE) --sort-by='.lastTimestamp' | tail -10 || echo "$(YELLOW)[WARN]$(NC) Cannot get events" + @echo "" + @echo "$(BLUE)=== Pod Description ===$(NC)" + @oc describe pod -l app=semantic-router -n $(OPENSHIFT_NAMESPACE) | tail -20 || echo "$(YELLOW)[WARN]$(NC) Cannot describe pods" + +# Show all available OpenShift targets +openshift-help: + @echo "$(BLUE)OpenShift deployment targets:$(NC)" + @echo " openshift-login - Login to OpenShift cluster" + @echo " openshift-logout - Logout from OpenShift cluster" + @echo " openshift-deploy - Deploy using Kustomize (basic)" + @echo " openshift-deploy-auto - Deploy using automated script" + @echo " openshift-deploy-template - Deploy using OpenShift template" + @echo " openshift-undeploy - Remove deployment (keep namespace)" + @echo " openshift-cleanup - Remove deployment and namespace" + @echo " openshift-status - Show deployment status" + @echo " openshift-logs - Show application logs (follow)" + @echo " openshift-logs-previous - Show previous pod logs" + @echo " openshift-routes - Show route URLs" + @echo " openshift-test - Test deployment connectivity" + @echo " openshift-port-forward-api - Port forward Classification API" + @echo " openshift-port-forward-grpc - Port forward gRPC API" + @echo " openshift-port-forward-metrics - Port forward metrics" + @echo " openshift-debug - Show debugging information" + @echo "" + @echo "$(BLUE)Configuration variables:$(NC)" + @echo " OPENSHIFT_SERVER - OpenShift API server URL (required)" + @echo " OPENSHIFT_USER - OpenShift username (default: admin)" + @echo " OPENSHIFT_PASSWORD - OpenShift password (required)" + @echo " OPENSHIFT_NAMESPACE - Deployment namespace (default: $(OPENSHIFT_NAMESPACE))" + @echo " OPENSHIFT_DEPLOYMENT_METHOD - Deployment method: kustomize|template (default: $(OPENSHIFT_DEPLOYMENT_METHOD))" + @echo " OPENSHIFT_CONTAINER_IMAGE - Container image (default: $(OPENSHIFT_CONTAINER_IMAGE))" + @echo " OPENSHIFT_CONTAINER_TAG - Container tag (default: $(OPENSHIFT_CONTAINER_TAG))" + @echo "" + @echo "$(BLUE)Example usage:$(NC)" + @echo " make openshift-deploy-auto OPENSHIFT_SERVER=https://api.cluster.example.com:6443 OPENSHIFT_PASSWORD=mypass" + @echo " make openshift-status OPENSHIFT_NAMESPACE=my-namespace" + @echo " make openshift-logs" \ No newline at end of file From f9389602fb3611fa50906f0b141aa7e15e24d492 Mon Sep 17 00:00:00 2001 From: Yossi Ovadia Date: Wed, 8 Oct 2025 13:36:48 -0700 Subject: [PATCH 2/2] fix: correct route URLs to use http instead of https Routes are created without TLS termination by default, so URLs should use http:// not https://. This fixes the quick test commands shown at deployment completion. Tested and verified: - curl http://semantic-router-api.../health works - curl -X POST http://semantic-router-api.../api/v1/classify/intent works Signed-off-by: Yossi Ovadia --- deploy/openshift/deploy-to-openshift.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/deploy/openshift/deploy-to-openshift.sh b/deploy/openshift/deploy-to-openshift.sh index cf8d8972..b89ff0d2 100755 --- a/deploy/openshift/deploy-to-openshift.sh +++ b/deploy/openshift/deploy-to-openshift.sh @@ -773,21 +773,21 @@ show_deployment_info() { fi if [[ -n "$api_route" ]]; then - echo "Classification API: https://$api_route" - echo "Health Check: https://$api_route/health" + echo "Classification API: http://$api_route" + echo "Health Check: http://$api_route/health" fi if [[ -n "$grpc_route" ]]; then - echo "gRPC API: https://$grpc_route" + echo "gRPC API: http://$grpc_route" fi if [[ -n "$metrics_route" ]]; then - echo "Metrics: https://$metrics_route/metrics" + echo "Metrics: http://$metrics_route/metrics" fi echo "" echo "=== Quick Test Commands ===" if [[ -n "$api_route" ]]; then - echo "curl -k https://$api_route/health" - echo "curl -k -X POST https://$api_route/api/v1/classify/intent -H 'Content-Type: application/json' -d '{\"text\": \"Hello world\"}'" + echo "curl http://$api_route/health" + echo "curl -X POST http://$api_route/api/v1/classify/intent -H 'Content-Type: application/json' -d '{\"text\": \"What is 2+2?\"}'" fi }