-
Notifications
You must be signed in to change notification settings - Fork 297
[Feat] Model Ensemble Support with OpenAPI Endpoints #731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Co-authored-by: rootfs <[email protected]>
Co-authored-by: rootfs <[email protected]>
|
@copilot the |
Co-authored-by: rootfs <[email protected]>
Co-authored-by: rootfs <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Co-authored-by: rootfs <[email protected]>
I've refactored the ensemble implementation as an independent OpenAI-compatible API server as requested. Changes made in commit 9acd73b:
The ensemble service can now be used standalone or integrated with semantic router, and can scale independently. |
Co-authored-by: rootfs <[email protected]>

Ensemble Service - Independent OpenAI-Compatible API Server ✅
This PR implements ensemble orchestration as an independent OpenAI-compatible API server, addressing the architectural requirement that extproc doesn't support multiple downstream endpoints.
Architecture
The ensemble service runs as a standalone HTTP server alongside the semantic router, providing clean separation of concerns and independent scalability.
Implementation
Independent Service (
pkg/ensembleserver/)server.go- Standalone HTTP server:/v1/chat/completionsendpoint/healthensemble.enabled: trueEnsemble Package (
pkg/ensemble/)factory.go- Orchestration engine:GetDefaultStrategy(),GetDefaultMinResponses()factory_test.go- Comprehensive test suite (100% coverage)ExtProc Changes
Removed all ensemble integration:
req_filter_ensemble.go- Deletedrouter.go- Reverted (no EnsembleFactory)processor_req_header.go- Reverted (no ensemble parsing)processor_req_body.go- Reverted (no ensemble execution)processor_res_header.go- Reverted (no ensemble metadata)ExtProc remains focused on single-endpoint routing.
Configuration
Usage
Start Services
Make Requests
Response Headers
Aggregation Strategies
Benefits
Testing
✅ All ensemble unit tests pass (8/8)
✅ Build succeeds
✅ Router binary created
✅ Backward compatible (disabled by default)
Documentation
config/ensemble/README.md: Usage guideconfig/ensemble/ARCHITECTURE.md: Architecture diagrams and design decisionsconfig/ensemble/ensemble-example.yaml: Example configurationENSEMBLE_IMPLEMENTATION.md: Implementation detailsPort Allocation
-port-api-port-ensemble-port-metrics-portFuture Enhancements
Addresses feedback: Ensemble now implemented as independent service, not integrated into extproc.
Original prompt
This section details on the original issue you should resolve
<issue_title>[Feat] Model Ensemble Support</issue_title>
<issue_description>## Introduction
Support a model ensemble orchestration service that can intelligently combine outputs from multiple LLM endpoints using configurable aggregation strategies, enabling improved reliability, accuracy, and flexible cost-performance trade-offs.
Use Case
Problem Statement
Real-World Scenarios
Critical Applications
Cost Optimization
Reliability & Accuracy
Model Diversity
Architecture
graph TB Client[Client Request] --> Router[Semantic Router] Router --> Orchestrator[Ensemble Orchestrator] Orchestrator --> Strategy{Routing Strategy} Strategy -->|Parallel Query| M1[Model Endpoint 1] Strategy -->|Parallel Query| M2[Model Endpoint 2] Strategy -->|Parallel Query| M3[Model Endpoint N] M1 --> Aggregator[Aggregation Engine] M2 --> Aggregator M3 --> Aggregator Aggregator --> Voting[Voting Strategy] Aggregator --> Weighted[Weighted Consensus] Aggregator --> Ranking[Reranking] Aggregator --> Average[Score Averaging] Aggregator --> FirstSuccess[First Success] Voting --> Response[Final Response] Weighted --> Response Ranking --> Response Average --> Response FirstSuccess --> Response style Orchestrator fill:#e1f5ff style Aggregator fill:#fff4e1 style Response fill:#e1ffe1Core Components
1. Ensemble Orchestrator
Coordinates parallel or sequential requests to multiple model endpoints:
2. Aggregation Engine
Combines multiple model outputs using configurable strategies:
3. Configuration Interface
Flexible control mechanisms:
X-Ensemble-Models,X-Ensemble-Strategy)4. Adaptive Triggering
Intelligent decision-making for when to use ensemble:
Expected Benefits
Accuracy & Reliability
Cost Optimization
Operational Excellence
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.