feat: Add production-stack profile for E2E testing framework #767
+1,196
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

Description
This PR implements the
production-stackprofile for the E2E testing framework, enabling comprehensive testing of Semantic Router in production-grade vLLM stack environments with high availability, load balancing, and observability features.Closes #657
Background
The E2E testing framework introduced in #655 provides an extensible profile-based architecture. This PR adds a production-stack profile to test Semantic Router deployment and functionality in production-grade vLLM stack environments, including:
Implementation
New Files
e2e/profiles/production-stack/profile.go(482 lines)Profileinterface for production-stack testinge2e/profiles/production-stack/values.yaml(169 lines)e2e/profiles/production-stack/prometheus-config.yaml(55 lines)Key Features
High Availability Setup
Load Balancing
Observability
Test Coverage
The profile includes both standard functional tests and production-specific tests:
Standard Tests:
chat-completions-requestchat-completions-stress-requestdomain-classifysemantic-cachepii-detectionjailbreak-detectionchat-completions-progressive-stressProduction Stack Specific Tests:
multi-replica-health- Verify all replicas are healthyload-balancing-verification- Test request distribution across replicasfailover-during-traffic- Verify graceful failover when a replica failsperformance-throughput- Measure throughput and latency under loadresource-utilization-monitoring- Check CPU, memory, and GPU utilizationTesting
Expected Behavior
Acceptance Criteria