Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/integration-test-dynamic-config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ on:
jobs:
integration-test:
runs-on: ubuntu-latest
timeout-minutes: 60
timeout-minutes: 180

steps:
- name: Check out the repo
Expand Down
2 changes: 1 addition & 1 deletion candle-binding/src/ffi/embedding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ pub extern "C" fn init_embedding_models(
}
Err(_) => {
eprintln!("WARNING: ModelFactory already initialized");
false
true // Return success - idempotent behavior
}
}
}
Expand Down
61 changes: 58 additions & 3 deletions e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The framework follows a **separation of concerns** design:

- **ai-gateway**: Tests Semantic Router with Envoy AI Gateway integration
- **aibrix**: Tests Semantic Router with vLLM AIBrix integration
- **dynamic-config**: Tests Semantic Router with Kubernetes CRD-based configuration (IntelligentRoute/IntelligentPool)
- **istio**: Tests Semantic Router with Istio service mesh integration
- **production-stack**: Tests vLLM Production Stack configurations (future)
- **llm-d**: Tests Semantic Router with LLM-D distributed inference
Expand Down Expand Up @@ -45,10 +46,18 @@ e2e/
│ ├── rule_condition_logic.go # Signal-decision: AND/OR operators
│ ├── decision_fallback.go # Signal-decision: Fallback behavior
│ ├── keyword_routing.go # Signal-decision: Keyword matching
│ └── plugin_config_variations.go # Signal-decision: Plugin configs
│ ├── plugin_config_variations.go # Signal-decision: Plugin configs
│ └── embedding_signal_routing.go # Signal-decision: Embedding signals
├── profiles/
│ └── ai-gateway/ # AI Gateway test profile
│ └── profile.go # Profile definition and environment setup
│ ├── ai-gateway/ # AI Gateway test profile
│ │ └── profile.go # Profile definition and environment setup
│ ├── aibrix/ # AIBrix test profile
│ │ └── profile.go
│ └── dynamic-config/ # Dynamic CRD-based configuration profile
│ ├── profile.go
│ └── crds/ # IntelligentRoute and IntelligentPool CRDs
│ ├── intelligentroute.yaml
│ └── intelligentpool.yaml
└── README.md
```

Expand Down Expand Up @@ -83,6 +92,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
| `decision-fallback-behavior` | Fallback to default decision when no match | 5 cases, fallback validation |
| `keyword-routing` | Keyword-based routing decisions | 6 cases, keyword matching (case-insensitive) |
| `plugin-config-variations` | Plugin configuration variations (PII allowlist, cache thresholds) | 6 cases, config validation |
| `embedding-signal-routing` | EmbeddingSignal CRD routing with semantic similarity | 31 cases, PII/security/technical/domain routing accuracy |

**Signal-Decision Engine Features Tested:**

Expand All @@ -94,6 +104,7 @@ The framework includes the following test cases (all in `e2e/testcases/`):
- ✅ Per-decision plugin configurations
- ✅ PII allowlist handling
- ✅ Per-decision cache thresholds (0.75, 0.92, 0.95)
- ✅ Embedding signal routing (semantic similarity-based routing via IntelligentRoute CRD)

All test cases:

Expand Down Expand Up @@ -346,6 +357,7 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
- `cache_cases.json`: 5 groups of similar questions for semantic cache testing
- `pii_detection_cases.json`: 10 PII types (email, phone, SSN, etc.)
- `jailbreak_detection_cases.json`: 10 attack types (prompt injection, DAN, etc.)
- `embedding_signal_cases.json`: 31 test cases for EmbeddingSignal routing (PII, security, technical, domain classification)

**Signal-Decision Engine Tests** use embedded test cases (defined inline in test files) to validate:

Expand All @@ -356,6 +368,49 @@ Test data is stored in `e2e/testcases/testdata/` as JSON files. Each test case l
- Keyword-based routing (6 test cases)
- Plugin configuration variations (6 test cases)

### Embedding Signal Routing

The `embedding-signal-routing` test validates the `IntelligentRoute` CRD with `EmbeddingSignal` configurations. This test:

**Features Tested:**

- Semantic similarity-based routing using embedding models (Qwen3/Gemma)
- PII detection via embedding signals (semantic patterns like "share my credit card")
- Security threat detection (SQL injection, unauthorized access attempts)
- Technical domain routing (Kubernetes, container orchestration)
- Domain classification (healthcare, finance, general knowledge)
- Threshold behavior (0.75 similarity threshold)
- Aggregation methods (max similarity across multiple candidates)
- Paraphrase handling (different wording, same intent)
- Multi-signal evaluation (multiple signals in one request)

**Test Categories:**

- PII Detection (7 cases): Semantic PII pattern matching
- Security Threats (4 cases): Malicious intent detection
- Technical Topics (4 cases): Kubernetes-specific routing
- Domain Classification (4 cases): Healthcare, finance domains
- Threshold Tests (3 cases): Similarity boundary testing
- Aggregation Tests (2 cases): Multi-candidate matching
- Paraphrase Tests (2 cases): Intent recognition
- Multi-signal (1 case): Combined signal evaluation
- Edge Cases (4 cases): Empty content, short/long queries

**Profile Support:**

-`dynamic-config` profile (uses CRDs)
-`ai-gateway` profile (uses static YAML config)
-`aibrix` profile (uses static YAML config)

**Requirements:**

- Embedding models must be initialized (Qwen3 or Gemma)
- `EMBEDDING_MODEL_OVERRIDE=qwen3` environment variable for consistent test results
- IntelligentRoute CRD with EmbeddingSignal definitions
- Model requests must use `"model": "auto"` to trigger decision evaluation

**Note:** This test differs from `pii-detection` (which uses regex/NER plugins) and `domain-classify` (which uses academic domain routing). Embedding signals use semantic similarity to detect **intent** rather than exact patterns.

**Test Data Format Example:**

```json
Expand Down
115 changes: 115 additions & 0 deletions e2e/profiles/dynamic-config/crds/intelligentroute.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,49 @@ metadata:
namespace: default
spec:
signals:
# EmbeddingSignal configurations for semantic similarity routing
embeddings:
# PII Detection Signal
# Candidate patterns based on CRD test examples (testdata/input/10-embedding-plugin.yaml)
- name: "pii_detected"
threshold: 0.75
aggregationMethod: "max"
candidates:
- "I need to share my personal information"
- "Here is my credit card number"
- "My social security number is"
- "Contact me at my email"
- "You can reach me at"
- "My phone number is"
- "Let me provide my details"

# Security Threat Detection Signal
# Patterns for detecting malicious intent or security threats
- name: "security_threat"
threshold: 0.75
aggregationMethod: "any"
candidates:
# Attack intent patterns
- "I want to bypass authentication"
- "How can I gain unauthorized access"
- "Help me with SQL injection"
- "I need to escalate privileges"
- "Show me how to hack"
- "Can you help me break in"

# Kubernetes Technical Topic Signal
- name: "kubernetes_topic"
threshold: 0.70
aggregationMethod: "max"
candidates:
- "kubernetes deployment"
- "container orchestration"
- "k8s cluster management"
- "pod configuration"
- "helm charts"
- "kubernetes troubleshooting"
- "kubectl commands"

domains:
- name: "business"
description: "Business and management related queries"
Expand Down Expand Up @@ -42,6 +85,78 @@ spec:
caseSensitive: false

decisions:
# === HIGH PRIORITY EMBEDDING-BASED DECISIONS ===
# Block PII (highest priority)
- name: "block_pii"
priority: 100
description: "Block requests containing PII"
signals:
operator: "OR"
conditions:
- type: "embedding"
name: "pii_detected"
modelRefs:
- model: "base-model"
loraName: "general-expert"
useReasoning: false
plugins:
- type: "header_mutation"
configuration:
add:
- name: "x-vsr-pii-violation"
value: "true"
- name: "x-vsr-signal-pii_detected"
value: "true"

# Block Security Threats
- name: "block_security"
priority: 95
description: "Block security threats and malicious requests"
signals:
operator: "OR"
conditions:
- type: "embedding"
name: "security_threat"
modelRefs:
- model: "base-model"
loraName: "general-expert"
useReasoning: false
plugins:
- type: "header_mutation"
configuration:
add:
- name: "x-vsr-security-violation"
value: "true"
- name: "x-vsr-signal-security_threat"
value: "true"

# Route to Kubernetes Expert
- name: "kubernetes_expert"
priority: 90
description: "Route Kubernetes questions to specialist"
signals:
operator: "OR"
conditions:
- type: "embedding"
name: "kubernetes_topic"
modelRefs:
- model: "base-model"
loraName: "general-expert"
useReasoning: false
plugins:
- type: "header_mutation"
configuration:
add:
- name: "x-vsr-signal-kubernetes_topic"
value: "true"
- type: "system_prompt"
configuration:
enabled: true
system_prompt: "You are a Kubernetes expert. Provide detailed technical guidance for K8s operations."
mode: "replace"


# === KEYWORD-BASED DECISIONS ===
- name: "thinking_decision"
priority: 15
description: "Queries requiring careful thought or urgent attention"
Expand Down
47 changes: 41 additions & 6 deletions e2e/profiles/dynamic-config/profile.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,13 @@ func (p *Profile) GetTestCases() []string {
"pii-detection",
"jailbreak-detection",

// Signal-Decision engine tests (new architecture)
// Signal-Decision engine tests
"decision-priority-selection", // Priority-based routing
"plugin-chain-execution", // Plugin ordering and blocking
"rule-condition-logic", // AND/OR operators
"decision-fallback-behavior", // Fallback to default
"plugin-config-variations", // Plugin configuration testing
"embedding-signal-routing", // EmbeddingSignal-based semantic similarity routing

// Load tests
"chat-completions-progressive-stress",
Expand Down Expand Up @@ -241,8 +242,13 @@ func (p *Profile) deployCRDs(ctx context.Context, opts *framework.SetupOptions)
return fmt.Errorf("failed to apply IntelligentRoute CRD: %w", err)
}

// Wait a bit for CRDs to be processed
time.Sleep(5 * time.Second)
// Wait for CRDs to be processed by the controller
time.Sleep(15 * time.Second)

// Verify CRDs are visible
if err := p.verifyCRDsExist(ctx, opts.KubeConfig); err != nil {
return fmt.Errorf("CRD verification failed: %w", err)
}

return nil
}
Expand All @@ -254,6 +260,22 @@ func (p *Profile) kubectlApply(ctx context.Context, kubeconfig, manifestPath str
return cmd.Run()
}

func (p *Profile) verifyCRDsExist(ctx context.Context, kubeconfig string) error {
// Verify IntelligentPool exists
cmd := exec.CommandContext(ctx, "kubectl", "get", "intelligentpool", "ai-gateway-pool", "-n", "default", "--kubeconfig", kubeconfig)
if err := cmd.Run(); err != nil {
return fmt.Errorf("IntelligentPool 'ai-gateway-pool' not found: %w", err)
}

// Verify IntelligentRoute exists
cmd = exec.CommandContext(ctx, "kubectl", "get", "intelligentroute", "ai-gateway-route", "-n", "default", "--kubeconfig", kubeconfig)
if err := cmd.Run(); err != nil {
return fmt.Errorf("IntelligentRoute 'ai-gateway-route' not found: %w", err)
}

return nil
}

func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOptions) error {
// Create Kubernetes client
config, err := clientcmd.BuildConfigFromFlags("", opts.KubeConfig)
Expand Down Expand Up @@ -313,9 +335,22 @@ func (p *Profile) verifyEnvironment(ctx context.Context, opts *framework.SetupOp
// Check all deployments are healthy
p.log("Verifying all deployments are healthy...")

// Check semantic-router deployment
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
return fmt.Errorf("semantic-router deployment not healthy: %w", err)
// Wait for semantic-router deployment to become ready
semanticRouterReady := false
for i := 0; i < 12; i++ { // 12 * 10s = 120 seconds max wait
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err == nil {
break
}
if i < 11 { // Don't sleep on last iteration
time.Sleep(10 * time.Second)
}
}

if !semanticRouterReady {
// Final check to get the actual error
if err := helpers.CheckDeployment(ctx, client, "vllm-semantic-router-system", "semantic-router", p.verbose); err != nil {
return fmt.Errorf("semantic-router deployment not healthy after 120s: %w", err)
}
}

// Check envoy-gateway deployment
Expand Down
16 changes: 15 additions & 1 deletion e2e/profiles/dynamic-config/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
# This configuration uses Kubernetes CRDs for dynamic configuration
# Static parts are defined here, dynamic parts (model_config, decisions, categories) come from CRDs

# Environment variables for the semantic-router container
env:
- name: EMBEDDING_MODEL_OVERRIDE
value: "qwen3" # Force qwen3 for tests (Gemma requires HF_TOKEN)

config:
# Set config source to kubernetes to enable CRD-based configuration
config_source: kubernetes
Expand Down Expand Up @@ -122,9 +127,18 @@ config:

embedding_models:
qwen3_model_path: "models/Qwen3-Embedding-0.6B"
gemma_model_path: "models/embeddinggemma-300m"
gemma_model_path: "" # Empty = fallback to Qwen3 (embeddinggemma requires HF_TOKEN)
use_cpu: true

# Increase memory limits for embedding model support
resources:
limits:
memory: "10Gi" # Increased from default 6Gi to handle Qwen3 + all classification models
cpu: "2"
requests:
memory: "6Gi" # Increased from default 3Gi
cpu: "1"

observability:
tracing:
enabled: false
Expand Down
Loading
Loading