vllm-project · Xunzhuo · Dec 2, 2025 · Nov 26, 2025
@@ -18,7 +18,7 @@ on:
 jobs:
   integration-test:
     runs-on: ubuntu-latest
-    timeout-minutes: 60
+    timeout-minutes: 180
 
     steps:
       - name: Check out the repo

@@ -295,7 +295,7 @@ pub extern "C" fn init_embedding_models(
         }
         Err(_) => {
             eprintln!("WARNING: ModelFactory already initialized");
-            false
+            true // Return success - idempotent behavior
         }
     }
 }

@@ -14,6 +14,7 @@ The framework follows a **separation of concerns** design:
 
 - **ai-gateway**: Tests Semantic Router with Envoy AI Gateway integration
 - **aibrix**: Tests Semantic Router with vLLM AIBrix integration
+- **dynamic-config**: Tests Semantic Router with Kubernetes CRD-based configuration (IntelligentRoute/IntelligentPool)
 - **istio**: Tests Semantic Router with Istio service mesh integration
 - **production-stack**: Tests vLLM Production Stack configurations (future)
 - **llm-d**: Tests Semantic Router with LLM-D distributed inference
@@ -45,10 +46,18 @@ e2e/
 │   ├── rule_condition_logic.go        # Signal-decision: AND/OR operators
 │   ├── decision_fallback.go           # Signal-decision: Fallback behavior
 │   ├── keyword_routing.go             # Signal-decision: Keyword matching
-│   └── plugin_config_variations.go    # Signal-decision: Plugin configs
+│   ├── plugin_config_variations.go    # Signal-decision: Plugin configs
+│   └── embedding_signal_routing.go    # Signal-decision: Embedding signals
 ├── profiles/
-│   └── ai-gateway/       # AI Gateway test profile
-│       └── profile.go    # Profile definition and environment setup
+│   ├── ai-gateway/       # AI Gateway test profile
+│   │   └── profile.go    # Profile definition and environment setup
+│   ├── aibrix/           # AIBrix test profile
+│   │   └── profile.go
+│   └── dynamic-config/   # Dynamic CRD-based configuration profile
+│       ├── profile.go
+│       └── crds/         # IntelligentRoute and IntelligentPool CRDs
+│           ├── intelligentroute.yaml
+│           └── intelligentpool.yaml
 └── README.md
 ```
 
@@ -83,6 +92,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
 | `decision-fallback-behavior` | Fallback to default decision when no match | 5 cases, fallback validation |
 | `keyword-routing` | Keyword-based routing decisions | 6 cases, keyword matching (case-insensitive) |
 | `plugin-config-variations` | Plugin configuration variations (PII allowlist, cache thresholds) | 6 cases, config validation |
+| `embedding-signal-routing` | EmbeddingSignal CRD routing with semantic similarity | 31 cases, PII/security/technical/domain routing accuracy |
 
 **Signal-Decision Engine Features Tested:**
 
@@ -94,6 +104,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
 - ✅ Per-decision plugin configurations
 - ✅ PII allowlist handling
 - ✅ Per-decision cache thresholds (0.75, 0.92, 0.95)
+- ✅ Embedding signal routing (semantic similarity-based routing via IntelligentRoute CRD)
 
 All test cases:
 
@@ -346,6 +357,7 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
 - `cache_cases.json`: 5 groups of similar questions for semantic cache testing
 - `pii_detection_cases.json`: 10 PII types (email, phone, SSN, etc.)
 - `jailbreak_detection_cases.json`: 10 attack types (prompt injection, DAN, etc.)
+- `embedding_signal_cases.json`: 31 test cases for EmbeddingSignal routing (PII, security, technical, domain classification)
 
 **Signal-Decision Engine Tests** use embedded test cases (defined inline in test files) to validate:
 
@@ -356,6 +368,49 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
 - Keyword-based routing (6 test cases)
 - Plugin configuration variations (6 test cases)
 
+### Embedding Signal Routing
+
+The `embedding-signal-routing` test validates the `IntelligentRoute` CRD with `EmbeddingSignal` configurations. This test:
+
+**Features Tested:**
+
+- Semantic similarity-based routing using embedding models (Qwen3/Gemma)
+- PII detection via embedding signals (semantic patterns like "share my credit card")
+- Security threat detection (SQL injection, unauthorized access attempts)
+- Technical domain routing (Kubernetes, container orchestration)
+- Domain classification (healthcare, finance, general knowledge)
+- Threshold behavior (0.75 similarity threshold)
+- Aggregation methods (max similarity across multiple candidates)
+- Paraphrase handling (different wording, same intent)
+- Multi-signal evaluation (multiple signals in one request)
+
+**Test Categories:**
+
+- PII Detection (7 cases): Semantic PII pattern matching
+- Security Threats (4 cases): Malicious intent detection
+- Technical Topics (4 cases): Kubernetes-specific routing
+- Domain Classification (4 cases): Healthcare, finance domains
+- Threshold Tests (3 cases): Similarity boundary testing
+- Aggregation Tests (2 cases): Multi-candidate matching
+- Paraphrase Tests (2 cases): Intent recognition
+- Multi-signal (1 case): Combined signal evaluation
+- Edge Cases (4 cases): Empty content, short/long queries
+
+**Profile Support:**
+
+- ✅ `dynamic-config` profile (uses CRDs)
+- ❌ `ai-gateway` profile (uses static YAML config)
+- ❌ `aibrix` profile (uses static YAML config)
+
+**Requirements:**
+
+- Embedding models must be initialized (Qwen3 or Gemma)
+- `EMBEDDING_MODEL_OVERRIDE=qwen3` environment variable for consistent test results
+- IntelligentRoute CRD with EmbeddingSignal definitions
+- Model requests must use `"model": "auto"` to trigger decision evaluation
+
+**Note:** This test differs from `pii-detection` (which uses regex/NER plugins) and `domain-classify` (which uses academic domain routing). Embedding signals use semantic similarity to detect **intent** rather than exact patterns.
+
 **Test Data Format Example:**
 
 ```json

@@ -5,6 +5,49 @@ metadata:
   namespace: default
 spec:
   signals:
+    # EmbeddingSignal configurations for semantic similarity routing
+    embeddings:
+      # PII Detection Signal
+      # Candidate patterns based on CRD test examples (testdata/input/10-embedding-plugin.yaml)
+      - name: "pii_detected"
+        threshold: 0.75
+        aggregationMethod: "max"
+        candidates:
+          - "I need to share my personal information"
+          - "Here is my credit card number"
+          - "My social security number is"
+          - "Contact me at my email"
+          - "You can reach me at"
+          - "My phone number is"
+          - "Let me provide my details"
+
+      # Security Threat Detection Signal
+      # Patterns for detecting malicious intent or security threats
+      - name: "security_threat"
+        threshold: 0.75
+        aggregationMethod: "any"
+        candidates:
+          # Attack intent patterns
+          - "I want to bypass authentication"
+          - "How can I gain unauthorized access"
+          - "Help me with SQL injection"
+          - "I need to escalate privileges"
+          - "Show me how to hack"
+          - "Can you help me break in"
+
+      # Kubernetes Technical Topic Signal
+      - name: "kubernetes_topic"
+        threshold: 0.70
+        aggregationMethod: "max"
+        candidates:
+          - "kubernetes deployment"
+          - "container orchestration"
+          - "k8s cluster management"
+          - "pod configuration"
+          - "helm charts"
+          - "kubernetes troubleshooting"
+          - "kubectl commands"
+
     domains:
       - name: "business"
         description: "Business and management related queries"
@@ -42,6 +85,78 @@ spec:
         caseSensitive: false
 
   decisions:
+    # === HIGH PRIORITY EMBEDDING-BASED DECISIONS ===
+    # Block PII (highest priority)
+    - name: "block_pii"
+      priority: 100
+      description: "Block requests containing PII"
+      signals:
+        operator: "OR"
+        conditions:
+          - type: "embedding"
+            name: "pii_detected"
+      modelRefs:
+        - model: "base-model"
+          loraName: "general-expert"
+          useReasoning: false
+      plugins:
+        - type: "header_mutation"
+          configuration:
+            add:
+              - name: "x-vsr-pii-violation"
+                value: "true"
+              - name: "x-vsr-signal-pii_detected"
+                value: "true"
+
+    # Block Security Threats
+    - name: "block_security"
+      priority: 95
+      description: "Block security threats and malicious requests"
+      signals:
+        operator: "OR"
+        conditions:
+          - type: "embedding"
+            name: "security_threat"
+      modelRefs:
+        - model: "base-model"
+          loraName: "general-expert"
+          useReasoning: false
+      plugins:
+        - type: "header_mutation"
+          configuration:
+            add:
+              - name: "x-vsr-security-violation"
+                value: "true"
+              - name: "x-vsr-signal-security_threat"
+                value: "true"
+
+    # Route to Kubernetes Expert
+    - name: "kubernetes_expert"
+      priority: 90
+      description: "Route Kubernetes questions to specialist"
+      signals:
+        operator: "OR"
+        conditions:
+          - type: "embedding"
+            name: "kubernetes_topic"
+      modelRefs:
+        - model: "base-model"
+          loraName: "general-expert"
+          useReasoning: false
+      plugins:
+        - type: "header_mutation"
+          configuration:
+            add:
+              - name: "x-vsr-signal-kubernetes_topic"
+                value: "true"
+        - type: "system_prompt"
+          configuration:
+            enabled: true
+            system_prompt: "You are a Kubernetes expert. Provide detailed technical guidance for K8s operations."
+            mode: "replace"
+
+
+    # === KEYWORD-BASED DECISIONS ===
     - name: "thinking_decision"
       priority: 15
       description: "Queries requiring careful thought or urgent attention"

@@ -116,12 +116,13 @@ func (p *Profile) GetTestCases() []string {
 		"pii-detection",
 		"jailbreak-detection",
 
-		// Signal-Decision engine tests (new architecture)
+		// Signal-Decision engine tests
 		"decision-priority-selection", // Priority-based routing
 		"plugin-chain-execution",      // Plugin ordering and blocking
 		"rule-condition-logic",        // AND/OR operators
 		"decision-fallback-behavior",  // Fallback to default
 		"plugin-config-variations",    // Plugin configuration testing
+		"embedding-signal-routing",    // EmbeddingSignal-based semantic similarity routing
 
 		// Load tests
 		"chat-completions-progressive-stress",
@@ -241,8 +242,13 @@ func (p *Profile) deployCRDs(ctx context.Context, opts *framework.SetupOptions)
 		return fmt.Errorf("failed to apply IntelligentRoute CRD: %w", err)
 	}
 
-	// Wait a bit for CRDs to be processed
-	time.Sleep(5 * time.Second)
+	// Wait for CRDs to be processed by the controller
+	time.Sleep(15 * time.Second)
+
+	// Verify CRDs are visible
+	if err := p.verifyCRDsExist(ctx, opts.KubeConfig); err != nil {
+		return fmt.Errorf("CRD verification failed: %w", err)
+	}
 
 	return nil
 }
@@ -254,6 +260,22 @@ func (p *Profile) kubectlApply(ctx context.Context, kubeconfig, manifestPath str
 	return cmd.Run()
 }
 
+func (p *Profile) verifyCRDsExist(ctx context.Context, kubeconfig string) error {
+	// Verify IntelligentPool exists
+	cmd := exec.CommandContext(ctx, "kubectl", "get", "intelligentpool", "ai-gateway-pool", "-n", "default", "--kubeconfig", kubeconfig)
+	if err := cmd.Run(); err != nil {
+		return fmt.Errorf("IntelligentPool 'ai-gateway-pool' not found: %w", err)
+	}
+
+	// Verify IntelligentRoute exists
+	cmd = exec.CommandContext(ctx, "kubectl", "get", "intelligentroute", "ai-gateway-route", "-n", "default", "--kubeconfig", kubeconfig)
+	if err := cmd.Run(); err != nil {
+		return fmt.Errorf("IntelligentRoute 'ai-gateway-route' not found: %w", err)
+	}
+
+	return nil
+}
+
 func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOptions) error {
 	// Create Kubernetes client
 	config, err := clientcmd.BuildConfigFromFlags("", opts.KubeConfig)
@@ -313,9 +335,22 @@ func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOp
 	// Check all deployments are healthy
 	p.log("Verifying all deployments are healthy...")
 
-	// Check semantic-router deployment
-	if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
-		return fmt.Errorf("semantic-router deployment not healthy: %w", err)
+	// Wait for semantic-router deployment to become ready
+	semanticRouterReady := false
+	for i := 0; i < 12; i++ { // 12 * 10s = 120 seconds max wait
+		if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err == nil {
+			break
+		}
+		if i < 11 { // Don't sleep on last iteration
+			time.Sleep(10 * time.Second)
+		}
+	}
+
+	if !semanticRouterReady {
+		// Final check to get the actual error
+		if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
+			return fmt.Errorf("semantic-router deployment not healthy after 120s: %w", err)
+		}
 	}
 
 	// Check envoy-gateway deployment

@@ -2,6 +2,11 @@
 # This configuration uses Kubernetes CRDs for dynamic configuration
 # Static parts are defined here, dynamic parts (model_config, decisions, categories) come from CRDs
 
+# Environment variables for the semantic-router container
+env:
+  - name: EMBEDDING_MODEL_OVERRIDE
+    value: "qwen3"  # Force qwen3 for tests (Gemma requires HF_TOKEN)
+
 config:
   # Set config source to kubernetes to enable CRD-based configuration
   config_source: kubernetes
@@ -122,9 +127,18 @@ config:
 
   embedding_models:
     qwen3_model_path: "models/Qwen3-Embedding-0.6B"
-    gemma_model_path: "models/embeddinggemma-300m"
+    gemma_model_path: ""  # Empty = fallback to Qwen3 (embeddinggemma requires HF_TOKEN)
     use_cpu: true
 
+# Increase memory limits for embedding model support
+resources:
+  limits:
+    memory: "10Gi"  # Increased from default 6Gi to handle Qwen3 + all classification models
+    cpu: "2"
+  requests:
+    memory: "6Gi"   # Increased from default 3Gi
+    cpu: "1"
+
   observability:
     tracing:
       enabled: false
-Original file line number
+Diff line change
@@ Expand Up / @@ -295,7 +295,7 @@ pub extern "C" fn init_embedding_models( @@
             }
             Err(_) => {
                 eprintln!("WARNING: ModelFactory already initialized");
-                false
+                true // Return success - idempotent behavior
             }
         }
     }
@@ Expand Down @@