Skip to content

Conversation

@liavweiss
Copy link

Description

This PR implements the production-stack profile for the E2E testing framework, enabling comprehensive testing of Semantic Router in production-grade vLLM stack environments with high availability, load balancing, and observability features.
Closes #657

Background

The E2E testing framework introduced in #655 provides an extensible profile-based architecture. This PR adds a production-stack profile to test Semantic Router deployment and functionality in production-grade vLLM stack environments, including:

  • Multi-replica deployments for high availability
  • Load balancing across replicas
  • Failover testing during active traffic
  • Performance and throughput validation
  • Resource utilization monitoring with Prometheus

Implementation

New Files

  1. e2e/profiles/production-stack/profile.go (482 lines)
    • Implements the Profile interface for production-stack testing
    • 7-step setup process:
      1. Deploy Semantic Router (initial 1 replica)
      2. Deploy Envoy Gateway
      3. Deploy Envoy AI Gateway (CRDs + Controller)
      4. Deploy Demo LLM and Gateway API Resources
      5. Scale deployments for HA/LB (2 replicas each)
      6. Deploy Prometheus for monitoring
      7. Verify all components are ready
    • Comprehensive teardown with proper resource cleanup
    • Service configuration for Envoy Gateway integration
  2. e2e/profiles/production-stack/values.yaml (169 lines)
    • Minimal Semantic Router configuration optimized for HA/LB/Monitoring tests
    • Includes all required classifiers (domain, PII, jailbreak)
    • Semantic cache configuration
    • Metrics and observability settings enabled
    • Base model with LoRA adapters configuration
  3. e2e/profiles/production-stack/prometheus-config.yaml (55 lines)
    • Prometheus scrape configuration for:
      • Semantic Router metrics endpoints
      • Kubernetes pods and nodes
      • Service discovery for multiple namespaces

Key Features

High Availability Setup

  • Deploys Semantic Router with 2 replicas
  • Scales demo LLM (vllm-llama3-8b-instruct) to 2 replicas
  • Verifies all replicas are healthy before proceeding

Load Balancing

  • Configures Envoy Gateway for request distribution
  • Uses Gateway API resources for routing
  • Service discovery with proper label selectors

Observability

  • Deploys Prometheus with custom configuration
  • Scrapes metrics from Semantic Router endpoints
  • Monitors Kubernetes pods and nodes
  • Configures RBAC for Prometheus service account

Test Coverage

The profile includes both standard functional tests and production-specific tests:
Standard Tests:

  • chat-completions-request
  • chat-completions-stress-request
  • domain-classify
  • semantic-cache
  • pii-detection
  • jailbreak-detection
  • chat-completions-progressive-stress
    Production Stack Specific Tests:
  • multi-replica-health - Verify all replicas are healthy
  • load-balancing-verification - Test request distribution across replicas
  • failover-during-traffic - Verify graceful failover when a replica fails
  • performance-throughput - Measure throughput and latency under load
  • resource-utilization-monitoring - Check CPU, memory, and GPU utilization

Testing

Expected Behavior

  1. All 7 setup steps complete successfully
  2. All deployments reach ready state (2 replicas each)
  3. Prometheus starts scraping metrics
  4. All test cases pass (when implemented)
  5. Teardown cleans up all resources

Acceptance Criteria

  • ✅ Production-stack profile directory structure created
  • ✅ Profile interface implemented with Setup/Teardown
  • ✅ Multi-replica deployment configuration
  • ✅ Prometheus monitoring integration
  • ✅ Service configuration for Envoy Gateway
  • ✅ Comprehensive error handling and logging
  • ✅ Resource cleanup in teardown
  • ✅ Profile registration in main.go
  • ✅ Production-specific test cases implementation
  • ✅ CI integration
  • ✅ Documentation updates

@netlify
Copy link

netlify bot commented Dec 3, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 584fd79
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69301dac5e49c40008564f75
😎 Deploy Preview https://deploy-preview-767--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@liavweiss liavweiss force-pushed the feature/production-stack-profile branch from 50a8798 to 7c22480 Compare December 3, 2025 10:50
@liavweiss liavweiss changed the title [E2E] Add production-stack profile for E2E testing framework feat: Add production-stack profile for E2E testing framework Dec 3, 2025
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/integration-test-k8s.yml

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/README.md
  • e2e/cmd/e2e/main.go
  • e2e/profiles/production-stack/profile.go
  • e2e/profiles/production-stack/prometheus-config.yaml
  • e2e/profiles/production-stack/values.yaml
  • e2e/testcases/failover_during_traffic.go
  • e2e/testcases/load_balancing_verification.go
  • e2e/testcases/multi_replica_health.go
  • e2e/testcases/performance_throughput.go
  • e2e/testcases/resource_utilization_monitoring.go

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/e2e.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@liavweiss liavweiss marked this pull request as draft December 3, 2025 11:22
@liavweiss liavweiss force-pushed the feature/production-stack-profile branch from 7c22480 to 584fd79 Compare December 3, 2025 11:23
@liavweiss liavweiss marked this pull request as ready for review December 3, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[E2E] Add production-stack profile for E2E testing framework

3 participants