feat: add Responses API endpoint type by tessaherself · Pull Request #2191 · huggingface/chat-ui

tessaherself · 2026-03-16T20:58:54Z

Summary

Adds support for the OpenAI Responses API (/v1/responses) as an opt-in endpoint type alongside the existing Chat Completions endpoint.

This aligns with HF's own Open Responses initiative and the broader ecosystem shift toward the Responses API standard:

HF Inference Providers already serve /v1/responses via router.huggingface.co
vLLM added Responses API support in Semantic Router v0.1
llama.cpp has an active PR for OpenResponses compliance

What this enables

Feature	Chat Completions	Responses API
Tool calling	Function calls (bolted on)	Native tool orchestration
Conversation state	Manual (full history each turn)	`previous_response_id` chaining
File attachments	Base64 `image_url` only	`input_file` + `input_image` + file IDs
Reasoning	`<think>` tag hacks	Native reasoning output items
Streaming	Raw text deltas	Semantic events
Cache performance	Baseline	40-80% improvement (OpenAI benchmarks)

How it works

New endpoint type "responses" — configure per-model:

{ "endpoints": [{ "type": "responses", "baseURL": "https://router.huggingface.co/v1" }] }

Chat Completions (type: "openai") remains the default — zero breaking changes
chatMessagesToResponsesInput() adapter converts existing message format → Responses API input
Stream adapter maps Responses API events → existing TextGenerationStreamOutput type
Uses the same OpenAI SDK v4 already in the project (openai@4.104.0)

Changes

New: endpointResponses.ts — endpoint factory + message format adapter
New: openAIResponsesToTextGenerationStream.ts — streaming + non-streaming response adapters
New: endpointResponses.spec.ts — 6 unit tests for message conversion
Modified: endpoints.ts — register responses type in endpoint registry + schema
Modified: models.ts — dynamic endpoint type dispatch

Test plan

npm run check — 0 type errors
npm run lint — passes prettier + eslint
npm run test — 161 tests passing (155 existing + 6 new)
Manual: configure model with type: "responses" pointing at router.huggingface.co/v1
Manual: verify streaming text generation
Manual: verify multimodal (image) input
Manual: verify existing type: "openai" endpoints unchanged

🤖 Generated with Claude Code

Add support for OpenAI's Responses API (`/v1/responses`) as an opt-in endpoint type alongside the existing Chat Completions endpoint. This aligns with HF's Open Responses initiative and the broader ecosystem shift (vLLM, llama.cpp) toward the Responses API standard. New endpoint type `"responses"` can be configured per-model via: endpoints: [{ type: "responses", baseURL: "..." }] Chat Completions remains the default — zero breaking changes. New files: - endpointResponses.ts: factory + chatMessagesToResponsesInput() adapter - openAIResponsesToTextGenerationStream.ts: stream adapter - endpointResponses.spec.ts: 6 unit tests for message conversion Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4a011542d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-16T21:04:27Z

src/lib/server/endpoints/openai/openAIResponsesToTextGenerationStream.ts

+				const isComplete = status === "completed";
+
+				yield {
+					token: { id: tokenId++, text: "", logprob: 0, special: true },
+					generated_text: isComplete ? generatedText : null,


Emit final text for incomplete Responses completions

When the Responses API finishes with response.completed status incomplete (for example after hitting max_output_tokens), this code sets generated_text to null because it only accepts status === "completed". The downstream generators only finalize an answer when generated_text is present, so truncated-but-successful runs can stream tokens to the client but never produce a final saved assistant message.

Useful? React with 👍 / 👎.

When the Responses API finishes with 'incomplete' status (e.g. hitting max_output_tokens), still save the generated text so the assistant message is preserved rather than silently dropped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Responses API endpoint type#2191

feat: add Responses API endpoint type#2191
tessaherself wants to merge 2 commits intohuggingface:mainfrom
tessaherself:responses-api-endpoint

tessaherself commented Mar 16, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tessaherself commented Mar 16, 2026

Summary

What this enables

How it works

Changes

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant