feat: add Responses API endpoint type#2191
feat: add Responses API endpoint type#2191tessaherself wants to merge 2 commits intohuggingface:mainfrom
Conversation
Add support for OpenAI's Responses API (`/v1/responses`) as an
opt-in endpoint type alongside the existing Chat Completions endpoint.
This aligns with HF's Open Responses initiative and the broader
ecosystem shift (vLLM, llama.cpp) toward the Responses API standard.
New endpoint type `"responses"` can be configured per-model via:
endpoints: [{ type: "responses", baseURL: "..." }]
Chat Completions remains the default — zero breaking changes.
New files:
- endpointResponses.ts: factory + chatMessagesToResponsesInput() adapter
- openAIResponsesToTextGenerationStream.ts: stream adapter
- endpointResponses.spec.ts: 6 unit tests for message conversion
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4a011542d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const isComplete = status === "completed"; | ||
|
|
||
| yield { | ||
| token: { id: tokenId++, text: "", logprob: 0, special: true }, | ||
| generated_text: isComplete ? generatedText : null, |
There was a problem hiding this comment.
Emit final text for incomplete Responses completions
When the Responses API finishes with response.completed status incomplete (for example after hitting max_output_tokens), this code sets generated_text to null because it only accepts status === "completed". The downstream generators only finalize an answer when generated_text is present, so truncated-but-successful runs can stream tokens to the client but never produce a final saved assistant message.
Useful? React with 👍 / 👎.
When the Responses API finishes with 'incomplete' status (e.g. hitting max_output_tokens), still save the generated text so the assistant message is preserved rather than silently dropped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Adds support for the OpenAI Responses API (
/v1/responses) as an opt-in endpoint type alongside the existing Chat Completions endpoint.This aligns with HF's own Open Responses initiative and the broader ecosystem shift toward the Responses API standard:
/v1/responsesviarouter.huggingface.coWhat this enables
previous_response_idchainingimage_urlonlyinput_file+input_image+ file IDs<think>tag hacksHow it works
"responses"— configure per-model:{ "endpoints": [{ "type": "responses", "baseURL": "https://router.huggingface.co/v1" }] }type: "openai") remains the default — zero breaking changeschatMessagesToResponsesInput()adapter converts existing message format → Responses API inputTextGenerationStreamOutputtypeopenai@4.104.0)Changes
endpointResponses.ts— endpoint factory + message format adapteropenAIResponsesToTextGenerationStream.ts— streaming + non-streaming response adaptersendpointResponses.spec.ts— 6 unit tests for message conversionendpoints.ts— registerresponsestype in endpoint registry + schemamodels.ts— dynamic endpoint type dispatchTest plan
npm run check— 0 type errorsnpm run lint— passes prettier + eslintnpm run test— 161 tests passing (155 existing + 6 new)type: "responses"pointing atrouter.huggingface.co/v1type: "openai"endpoints unchanged🤖 Generated with Claude Code