Support OpenAI Responses API

### Is your feature request related to a problem? Please describe.

The semantic router currently supports the OpenAI Chat Completions API (`/v1/chat/completions`) for routing and processing LLM requests. However, OpenAI and Azure OpenAI have introduced a new **Responses API** (`/v1/responses`) that provides a more powerful, stateful API experience. This new API brings together the best capabilities from both the chat completions and assistants APIs in a unified interface.

The Responses API offers several advantages over the traditional Chat Completions API:
- **Stateful conversations**: Built-in conversation state management with response chaining via `previous_response_id`
- **Advanced tool support**: Native support for code interpreter, function calling, image generation, and MCP (Model Context Protocol) servers
- **Background tasks**: Asynchronous processing for long-running tasks with polling support
- **Enhanced streaming**: Better streaming capabilities with resumable streams and sequence tracking
- **File handling**: Direct support for file inputs (PDFs, images, etc.) with automatic upload to containers
- **Reasoning models**: First-class support for reasoning models (o1, o3, o4-mini) with encrypted reasoning items

Currently, users who want to leverage these advanced capabilities cannot route their requests through the semantic router, limiting the router's applicability for modern LLM workflows that require stateful interactions, advanced tooling, or reasoning capabilities.

### Describe the solution you'd like

Add support for the OpenAI Responses API to the semantic router, enabling intelligent routing and classification for Responses API requests while preserving all the advanced features of the API.

**Key implementation requirements:**

1. **New endpoint support**: Handle `POST /v1/responses` and `GET /v1/responses/{response_id}` endpoints
2. **Request parsing**: Parse Responses API request format including:
   - `input` field (replaces `messages`)
   - `previous_response_id` for conversation chaining
   - `tools` with extended types (code_interpreter, image_generation, mcp)
   - `background` mode flag
   - `stream` parameter with sequence tracking
   - `store` parameter for stateful/stateless modes

3. **Semantic routing integration**: Apply intent classification to Responses API requests:
   - Extract user content from `input` field (which can be text, messages, or mixed content)
   - Classify intent and route to appropriate models
   - Preserve conversation context when using `previous_response_id`

4. **Response handling**: Process Responses API responses including:
   - Response object format with `output` array
   - Streaming events with `sequence_number` tracking
   - Background task status (`queued`, `in_progress`, `completed`)
   - VSR (vLLM Semantic Router) headers injection for routing metadata

5. **Feature preservation**: Ensure all Responses API features work through the router:
   - Function calling and tool execution
   - Code interpreter with container management
   - Image generation and editing
   - MCP server integration
   - File uploads and processing
   - Reasoning model support with encrypted reasoning items

6. **Backward compatibility**: Maintain full support for existing Chat Completions API while adding Responses API support

**Example usage after implementation:**

```python
# Client using Responses API through semantic router
from openai import OpenAI

client = OpenAI(
    base_url="http://semantic-router:8801/openai/v1/",
    api_key="your-key"
)

# Router will classify intent and select best model
response = client.responses.create(
    model="auto",  # Router's intelligent model selection
    input="Solve the equation 3x + 11 = 14 using code",
    tools=[{"type": "code_interpreter", "container": {"type": "auto"}}]
)

# Chain responses with context preservation
second_response = client.responses.create(
    model="auto",
    previous_response_id=response.id,
    input="Now explain the solution step by step"
)
```

### Additional context

**Related documentation:**
- [Azure OpenAI Responses API Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/responses)
- [OpenAI Responses API Reference](https://platform.openai.com/docs/api-reference/responses)

**Benefits for semantic router users:**
- Enable routing for advanced LLM workflows (agents, code execution, multi-turn reasoning)
- Support modern AI applications that require stateful conversations
- Provide intelligent model selection for reasoning tasks and tool-heavy workloads
- Maintain semantic router's value proposition (cost optimization, latency reduction) for next-generation LLM APIs

**Implementation considerations:**
- The Responses API uses different request/response structures than Chat Completions
- Conversation state management may require additional caching strategies
- Background tasks introduce asynchronous processing patterns
- Tool execution (especially code interpreter and MCP) may need special handling in the routing layer
- Streaming format differs from SSE used in Chat Completions

**Priority justification:**
As the Responses API becomes the recommended approach for building advanced LLM applications (especially with reasoning models and agents), supporting it in the semantic router is crucial for maintaining relevance and enabling users to leverage the router's intelligent routing capabilities with modern LLM workflows.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support OpenAI Responses API #306

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support OpenAI Responses API #306

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions