Skip to content

Commit 16e3791

Browse files
author
Yehudit Kerido
committed
Add Comprehensive Test Coverage for Embedding-Based Signals
Signed-off-by: Yehudit Kerido <[email protected]>
1 parent d5fdf28 commit 16e3791

File tree

11 files changed

+843
-18
lines changed

11 files changed

+843
-18
lines changed

.github/workflows/integration-test-dynamic-config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ on:
1818
jobs:
1919
integration-test:
2020
runs-on: ubuntu-latest
21-
timeout-minutes: 60
21+
timeout-minutes: 180
2222

2323
steps:
2424
- name: Check out the repo

candle-binding/src/ffi/embedding.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ pub extern "C" fn init_embedding_models(
295295
}
296296
Err(_) => {
297297
eprintln!("WARNING: ModelFactory already initialized");
298-
false
298+
true // Return success - idempotent behavior
299299
}
300300
}
301301
}

e2e/README.md

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ The framework follows a **separation of concerns** design:
1414

1515
- **ai-gateway**: Tests Semantic Router with Envoy AI Gateway integration
1616
- **aibrix**: Tests Semantic Router with vLLM AIBrix integration
17+
- **dynamic-config**: Tests Semantic Router with Kubernetes CRD-based configuration (IntelligentRoute/IntelligentPool)
1718
- **istio**: Tests Semantic Router with Istio Gateway (future)
1819
- **production-stack**: Tests vLLM Production Stack configurations (future)
1920
- **llm-d**: Tests Semantic Router with LLM-D distributed inference
@@ -45,10 +46,18 @@ e2e/
4546
│ ├── rule_condition_logic.go # Signal-decision: AND/OR operators
4647
│ ├── decision_fallback.go # Signal-decision: Fallback behavior
4748
│ ├── keyword_routing.go # Signal-decision: Keyword matching
48-
│ └── plugin_config_variations.go # Signal-decision: Plugin configs
49+
│ ├── plugin_config_variations.go # Signal-decision: Plugin configs
50+
│ └── embedding_signal_routing.go # Signal-decision: Embedding signals
4951
├── profiles/
50-
│ └── ai-gateway/ # AI Gateway test profile
51-
│ └── profile.go # Profile definition and environment setup
52+
│ ├── ai-gateway/ # AI Gateway test profile
53+
│ │ └── profile.go # Profile definition and environment setup
54+
│ ├── aibrix/ # AIBrix test profile
55+
│ │ └── profile.go
56+
│ └── dynamic-config/ # Dynamic CRD-based configuration profile
57+
│ ├── profile.go
58+
│ └── crds/ # IntelligentRoute and IntelligentPool CRDs
59+
│ ├── intelligentroute.yaml
60+
│ └── intelligentpool.yaml
5261
└── README.md
5362
```
5463

@@ -83,6 +92,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
8392
| `decision-fallback-behavior` | Fallback to default decision when no match | 5 cases, fallback validation |
8493
| `keyword-routing` | Keyword-based routing decisions | 6 cases, keyword matching (case-insensitive) |
8594
| `plugin-config-variations` | Plugin configuration variations (PII allowlist, cache thresholds) | 6 cases, config validation |
95+
| `embedding-signal-routing` | EmbeddingSignal CRD routing with semantic similarity | 31 cases, PII/security/technical/domain routing accuracy |
8696

8797
**Signal-Decision Engine Features Tested:**
8898

@@ -94,6 +104,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
94104
- ✅ Per-decision plugin configurations
95105
- ✅ PII allowlist handling
96106
- ✅ Per-decision cache thresholds (0.75, 0.92, 0.95)
107+
- ✅ Embedding signal routing (semantic similarity-based routing via IntelligentRoute CRD)
97108

98109
All test cases:
99110

@@ -346,6 +357,7 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
346357
- `cache_cases.json`: 5 groups of similar questions for semantic cache testing
347358
- `pii_detection_cases.json`: 10 PII types (email, phone, SSN, etc.)
348359
- `jailbreak_detection_cases.json`: 10 attack types (prompt injection, DAN, etc.)
360+
- `embedding_signal_cases.json`: 31 test cases for EmbeddingSignal routing (PII, security, technical, domain classification)
349361

350362
**Signal-Decision Engine Tests** use embedded test cases (defined inline in test files) to validate:
351363

@@ -356,6 +368,49 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
356368
- Keyword-based routing (6 test cases)
357369
- Plugin configuration variations (6 test cases)
358370

371+
### Embedding Signal Routing
372+
373+
The `embedding-signal-routing` test validates the `IntelligentRoute` CRD with `EmbeddingSignal` configurations. This test:
374+
375+
**Features Tested:**
376+
377+
- Semantic similarity-based routing using embedding models (Qwen3/Gemma)
378+
- PII detection via embedding signals (semantic patterns like "share my credit card")
379+
- Security threat detection (SQL injection, unauthorized access attempts)
380+
- Technical domain routing (Kubernetes, container orchestration)
381+
- Domain classification (healthcare, finance, general knowledge)
382+
- Threshold behavior (0.75 similarity threshold)
383+
- Aggregation methods (max similarity across multiple candidates)
384+
- Paraphrase handling (different wording, same intent)
385+
- Multi-signal evaluation (multiple signals in one request)
386+
387+
**Test Categories:**
388+
389+
- PII Detection (7 cases): Semantic PII pattern matching
390+
- Security Threats (4 cases): Malicious intent detection
391+
- Technical Topics (4 cases): Kubernetes-specific routing
392+
- Domain Classification (4 cases): Healthcare, finance domains
393+
- Threshold Tests (3 cases): Similarity boundary testing
394+
- Aggregation Tests (2 cases): Multi-candidate matching
395+
- Paraphrase Tests (2 cases): Intent recognition
396+
- Multi-signal (1 case): Combined signal evaluation
397+
- Edge Cases (4 cases): Empty content, short/long queries
398+
399+
**Profile Support:**
400+
401+
-`dynamic-config` profile (uses CRDs)
402+
-`ai-gateway` profile (uses static YAML config)
403+
-`aibrix` profile (uses static YAML config)
404+
405+
**Requirements:**
406+
407+
- Embedding models must be initialized (Qwen3 or Gemma)
408+
- `EMBEDDING_MODEL_OVERRIDE=qwen3` environment variable for consistent test results
409+
- IntelligentRoute CRD with EmbeddingSignal definitions
410+
- Model requests must use `"model": "auto"` to trigger decision evaluation
411+
412+
**Note:** This test differs from `pii-detection` (which uses regex/NER plugins) and `domain-classify` (which uses academic domain routing). Embedding signals use semantic similarity to detect **intent** rather than exact patterns.
413+
359414
**Test Data Format Example:**
360415

361416
```json

e2e/profiles/dynamic-config/crds/intelligentroute.yaml

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,49 @@ metadata:
55
namespace: default
66
spec:
77
signals:
8+
# EmbeddingSignal configurations for semantic similarity routing
9+
embeddings:
10+
# PII Detection Signal
11+
# Candidate patterns based on CRD test examples (testdata/input/10-embedding-plugin.yaml)
12+
- name: "pii_detected"
13+
threshold: 0.75
14+
aggregationMethod: "max"
15+
candidates:
16+
- "I need to share my personal information"
17+
- "Here is my credit card number"
18+
- "My social security number is"
19+
- "Contact me at my email"
20+
- "You can reach me at"
21+
- "My phone number is"
22+
- "Let me provide my details"
23+
24+
# Security Threat Detection Signal
25+
# Patterns for detecting malicious intent or security threats
26+
- name: "security_threat"
27+
threshold: 0.75
28+
aggregationMethod: "any"
29+
candidates:
30+
# Attack intent patterns
31+
- "I want to bypass authentication"
32+
- "How can I gain unauthorized access"
33+
- "Help me with SQL injection"
34+
- "I need to escalate privileges"
35+
- "Show me how to hack"
36+
- "Can you help me break in"
37+
38+
# Kubernetes Technical Topic Signal
39+
- name: "kubernetes_topic"
40+
threshold: 0.70
41+
aggregationMethod: "max"
42+
candidates:
43+
- "kubernetes deployment"
44+
- "container orchestration"
45+
- "k8s cluster management"
46+
- "pod configuration"
47+
- "helm charts"
48+
- "kubernetes troubleshooting"
49+
- "kubectl commands"
50+
851
domains:
952
- name: "business"
1053
description: "Business and management related queries"
@@ -42,6 +85,78 @@ spec:
4285
caseSensitive: false
4386

4487
decisions:
88+
# === HIGH PRIORITY EMBEDDING-BASED DECISIONS ===
89+
# Block PII (highest priority)
90+
- name: "block_pii"
91+
priority: 100
92+
description: "Block requests containing PII"
93+
signals:
94+
operator: "OR"
95+
conditions:
96+
- type: "embedding"
97+
name: "pii_detected"
98+
modelRefs:
99+
- model: "base-model"
100+
loraName: "general-expert"
101+
useReasoning: false
102+
plugins:
103+
- type: "header_mutation"
104+
configuration:
105+
add:
106+
- name: "x-vsr-pii-violation"
107+
value: "true"
108+
- name: "x-vsr-signal-pii_detected"
109+
value: "true"
110+
111+
# Block Security Threats
112+
- name: "block_security"
113+
priority: 95
114+
description: "Block security threats and malicious requests"
115+
signals:
116+
operator: "OR"
117+
conditions:
118+
- type: "embedding"
119+
name: "security_threat"
120+
modelRefs:
121+
- model: "base-model"
122+
loraName: "general-expert"
123+
useReasoning: false
124+
plugins:
125+
- type: "header_mutation"
126+
configuration:
127+
add:
128+
- name: "x-vsr-security-violation"
129+
value: "true"
130+
- name: "x-vsr-signal-security_threat"
131+
value: "true"
132+
133+
# Route to Kubernetes Expert
134+
- name: "kubernetes_expert"
135+
priority: 90
136+
description: "Route Kubernetes questions to specialist"
137+
signals:
138+
operator: "OR"
139+
conditions:
140+
- type: "embedding"
141+
name: "kubernetes_topic"
142+
modelRefs:
143+
- model: "base-model"
144+
loraName: "general-expert"
145+
useReasoning: false
146+
plugins:
147+
- type: "header_mutation"
148+
configuration:
149+
add:
150+
- name: "x-vsr-signal-kubernetes_topic"
151+
value: "true"
152+
- type: "system_prompt"
153+
configuration:
154+
enabled: true
155+
system_prompt: "You are a Kubernetes expert. Provide detailed technical guidance for K8s operations."
156+
mode: "replace"
157+
158+
159+
# === KEYWORD-BASED DECISIONS ===
45160
- name: "thinking_decision"
46161
priority: 15
47162
description: "Queries requiring careful thought or urgent attention"

e2e/profiles/dynamic-config/profile.go

Lines changed: 41 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -116,12 +116,13 @@ func (p *Profile) GetTestCases() []string {
116116
"pii-detection",
117117
"jailbreak-detection",
118118

119-
// Signal-Decision engine tests (new architecture)
119+
// Signal-Decision engine tests
120120
"decision-priority-selection", // Priority-based routing
121121
"plugin-chain-execution", // Plugin ordering and blocking
122122
"rule-condition-logic", // AND/OR operators
123123
"decision-fallback-behavior", // Fallback to default
124124
"plugin-config-variations", // Plugin configuration testing
125+
"embedding-signal-routing", // EmbeddingSignal-based semantic similarity routing
125126

126127
// Load tests
127128
"chat-completions-progressive-stress",
@@ -241,8 +242,13 @@ func (p *Profile) deployCRDs(ctx context.Context, opts *framework.SetupOptions)
241242
return fmt.Errorf("failed to apply IntelligentRoute CRD: %w", err)
242243
}
243244

244-
// Wait a bit for CRDs to be processed
245-
time.Sleep(5 * time.Second)
245+
// Wait for CRDs to be processed by the controller
246+
time.Sleep(15 * time.Second)
247+
248+
// Verify CRDs are visible
249+
if err := p.verifyCRDsExist(ctx, opts.KubeConfig); err != nil {
250+
return fmt.Errorf("CRD verification failed: %w", err)
251+
}
246252

247253
return nil
248254
}
@@ -254,6 +260,22 @@ func (p *Profile) kubectlApply(ctx context.Context, kubeconfig, manifestPath str
254260
return cmd.Run()
255261
}
256262

263+
func (p *Profile) verifyCRDsExist(ctx context.Context, kubeconfig string) error {
264+
// Verify IntelligentPool exists
265+
cmd := exec.CommandContext(ctx, "kubectl", "get", "intelligentpool", "ai-gateway-pool", "-n", "default", "--kubeconfig", kubeconfig)
266+
if err := cmd.Run(); err != nil {
267+
return fmt.Errorf("IntelligentPool 'ai-gateway-pool' not found: %w", err)
268+
}
269+
270+
// Verify IntelligentRoute exists
271+
cmd = exec.CommandContext(ctx, "kubectl", "get", "intelligentroute", "ai-gateway-route", "-n", "default", "--kubeconfig", kubeconfig)
272+
if err := cmd.Run(); err != nil {
273+
return fmt.Errorf("IntelligentRoute 'ai-gateway-route' not found: %w", err)
274+
}
275+
276+
return nil
277+
}
278+
257279
func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOptions) error {
258280
// Create Kubernetes client
259281
config, err := clientcmd.BuildConfigFromFlags("", opts.KubeConfig)
@@ -313,9 +335,22 @@ func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOp
313335
// Check all deployments are healthy
314336
p.log("Verifying all deployments are healthy...")
315337

316-
// Check semantic-router deployment
317-
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
318-
return fmt.Errorf("semantic-router deployment not healthy: %w", err)
338+
// Wait for semantic-router deployment to become ready
339+
semanticRouterReady := false
340+
for i := 0; i < 12; i++ { // 12 * 10s = 120 seconds max wait
341+
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err == nil {
342+
break
343+
}
344+
if i < 11 { // Don't sleep on last iteration
345+
time.Sleep(10 * time.Second)
346+
}
347+
}
348+
349+
if !semanticRouterReady {
350+
// Final check to get the actual error
351+
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
352+
return fmt.Errorf("semantic-router deployment not healthy after 120s: %w", err)
353+
}
319354
}
320355

321356
// Check envoy-gateway deployment

e2e/profiles/dynamic-config/values.yaml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
# This configuration uses Kubernetes CRDs for dynamic configuration
33
# Static parts are defined here, dynamic parts (model_config, decisions, categories) come from CRDs
44

5+
# Environment variables for the semantic-router container
6+
env:
7+
- name: EMBEDDING_MODEL_OVERRIDE
8+
value: "qwen3" # Force qwen3 for tests (Gemma requires HF_TOKEN)
9+
510
config:
611
# Set config source to kubernetes to enable CRD-based configuration
712
config_source: kubernetes
@@ -122,9 +127,18 @@ config:
122127

123128
embedding_models:
124129
qwen3_model_path: "models/Qwen3-Embedding-0.6B"
125-
gemma_model_path: "models/embeddinggemma-300m"
130+
gemma_model_path: "" # Empty = fallback to Qwen3 (embeddinggemma requires HF_TOKEN)
126131
use_cpu: true
127132

133+
# Increase memory limits for embedding model support
134+
resources:
135+
limits:
136+
memory: "10Gi" # Increased from default 6Gi to handle Qwen3 + all classification models
137+
cpu: "2"
138+
requests:
139+
memory: "6Gi" # Increased from default 3Gi
140+
cpu: "1"
141+
128142
observability:
129143
tracing:
130144
enabled: false

0 commit comments

Comments
 (0)