Skip to content

feat: add Responses API endpoint type#2191

Open
tessaherself wants to merge 2 commits intohuggingface:mainfrom
tessaherself:responses-api-endpoint
Open

feat: add Responses API endpoint type#2191
tessaherself wants to merge 2 commits intohuggingface:mainfrom
tessaherself:responses-api-endpoint

Conversation

@tessaherself
Copy link

Summary

Adds support for the OpenAI Responses API (/v1/responses) as an opt-in endpoint type alongside the existing Chat Completions endpoint.

This aligns with HF's own Open Responses initiative and the broader ecosystem shift toward the Responses API standard:

  • HF Inference Providers already serve /v1/responses via router.huggingface.co
  • vLLM added Responses API support in Semantic Router v0.1
  • llama.cpp has an active PR for OpenResponses compliance

What this enables

Feature Chat Completions Responses API
Tool calling Function calls (bolted on) Native tool orchestration
Conversation state Manual (full history each turn) previous_response_id chaining
File attachments Base64 image_url only input_file + input_image + file IDs
Reasoning <think> tag hacks Native reasoning output items
Streaming Raw text deltas Semantic events
Cache performance Baseline 40-80% improvement (OpenAI benchmarks)

How it works

  • New endpoint type "responses" — configure per-model:
    { "endpoints": [{ "type": "responses", "baseURL": "https://router.huggingface.co/v1" }] }
  • Chat Completions (type: "openai") remains the default — zero breaking changes
  • chatMessagesToResponsesInput() adapter converts existing message format → Responses API input
  • Stream adapter maps Responses API events → existing TextGenerationStreamOutput type
  • Uses the same OpenAI SDK v4 already in the project (openai@4.104.0)

Changes

  • New: endpointResponses.ts — endpoint factory + message format adapter
  • New: openAIResponsesToTextGenerationStream.ts — streaming + non-streaming response adapters
  • New: endpointResponses.spec.ts — 6 unit tests for message conversion
  • Modified: endpoints.ts — register responses type in endpoint registry + schema
  • Modified: models.ts — dynamic endpoint type dispatch

Test plan

  • npm run check — 0 type errors
  • npm run lint — passes prettier + eslint
  • npm run test — 161 tests passing (155 existing + 6 new)
  • Manual: configure model with type: "responses" pointing at router.huggingface.co/v1
  • Manual: verify streaming text generation
  • Manual: verify multimodal (image) input
  • Manual: verify existing type: "openai" endpoints unchanged

🤖 Generated with Claude Code

Add support for OpenAI's Responses API (`/v1/responses`) as an
opt-in endpoint type alongside the existing Chat Completions endpoint.

This aligns with HF's Open Responses initiative and the broader
ecosystem shift (vLLM, llama.cpp) toward the Responses API standard.

New endpoint type `"responses"` can be configured per-model via:
  endpoints: [{ type: "responses", baseURL: "..." }]

Chat Completions remains the default — zero breaking changes.

New files:
- endpointResponses.ts: factory + chatMessagesToResponsesInput() adapter
- openAIResponsesToTextGenerationStream.ts: stream adapter
- endpointResponses.spec.ts: 6 unit tests for message conversion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4a011542d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +124 to +128
const isComplete = status === "completed";

yield {
token: { id: tokenId++, text: "", logprob: 0, special: true },
generated_text: isComplete ? generatedText : null,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit final text for incomplete Responses completions

When the Responses API finishes with response.completed status incomplete (for example after hitting max_output_tokens), this code sets generated_text to null because it only accepts status === "completed". The downstream generators only finalize an answer when generated_text is present, so truncated-but-successful runs can stream tokens to the client but never produce a final saved assistant message.

Useful? React with 👍 / 👎.

When the Responses API finishes with 'incomplete' status (e.g. hitting
max_output_tokens), still save the generated text so the assistant
message is preserved rather than silently dropped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant