diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml
index d39278584..b5f22afb5 100644
--- a/docs/inference-providers/_toctree.yml
+++ b/docs/inference-providers/_toctree.yml
@@ -19,6 +19,8 @@
title: Structured Outputs with LLMs
- local: guides/function-calling
title: Function Calling
+ - local: guides/responses-api
+ title: Responses API (beta)
- local: guides/gpt-oss
title: How to use OpenAI gpt-oss
- local: guides/image-editor
diff --git a/docs/inference-providers/guides/responses-api.md b/docs/inference-providers/guides/responses-api.md
new file mode 100644
index 000000000..cb8422c21
--- /dev/null
+++ b/docs/inference-providers/guides/responses-api.md
@@ -0,0 +1,813 @@
+# Responses API (beta)
+
+The Responses API (from OpenAI) provides a unified interface for model interactions with Hugging Face Inference Providers. Use your existing OpenAI SDKs to access features like multi-provider routing, event streaming, structured outputs, and Remote MCP tools.
+
+> [!TIP]
+> This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens).
+
+## Why build with the Responses API?
+
+The Responses API provides a unified interface built for agentic apps. With it, you get:
+
+- **Built-in tool orchestration.** Invoke functions, server-side MCP tools, and schema-validated outputs without changing endpoints.
+- **Event-driven streaming.** Receive semantic events such as `response.created`, `output_text.delta`, and `response.completed` to power incremental UIs.
+- **Reasoning controls and structured outputs.** Dial up or down reasoning effort and require models to return schema-compliant JSON every time.
+
+## Prerequisites
+
+- A Hugging Face account with remaining Inference Providers credits (free tier available).
+- A fine-grained [Hugging Face token](https://huggingface.co/settings/tokens) with “Make calls to Inference Providers” permission stored in `HF_TOKEN`.
+
+> [!TIP]
+> All Inference Providers chat completion models should be compatible with the Responses API. You can browse available models on the [Inference Models page](https://huggingface.co/inference/models).
+
+## Configure your Responses client
+
+Install the OpenAI SDK for your language of choice before running the snippets below (`pip install openai` for Python or `npm install openai` for Node.js). If you prefer issuing raw HTTP calls, any standard tool such as `curl` will work as well.
+
+
+
+
+
+```python
+import os
+from openai import OpenAI
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="openai/gpt-oss-120b:groq",
+ instructions="You are a helpful assistant.",
+ input="Tell me a three-sentence bedtime story about a unicorn.",
+)
+
+print(response.output_text)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "openai/gpt-oss-120b:groq",
+ instructions: "You are a helpful assistant.",
+ input: "Tell me a three-sentence bedtime story about a unicorn.",
+});
+
+console.log(response.output_text);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "openai/gpt-oss-120b:groq",
+ "instructions": "You are a helpful assistant.",
+ "input": "Tell me a three-sentence bedtime story about a unicorn."
+ }'
+```
+
+
+
+
+
+
+> [!TIP]
+> If you plan to use a specific provider, append it to the model id as `:` (for example `moonshotai/Kimi-K2-Instruct-0905:groq`). Otherwise, omit the suffix and let routing fall back to the default provider.
+
+## Core Response patterns
+
+### Plain text output
+
+For a single response message, pass a string as input. The Responses API returns both the full `response` object and a convenience `output_text` helper.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="moonshotai/Kimi-K2-Instruct-0905:groq",
+ instructions="You are a helpful assistant.",
+ input="Tell me a three sentence bedtime story about a unicorn.",
+)
+
+print(response.output_text)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "moonshotai/Kimi-K2-Instruct-0905:groq",
+ instructions: "You are a helpful assistant.",
+ input: "Tell me a three sentence bedtime story about a unicorn.",
+});
+
+console.log(response.output_text);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
+ "instructions": "You are a helpful assistant.",
+ "input": "Tell me a three sentence bedtime story about a unicorn."
+ }'
+```
+
+
+
+
+
+### Multimodal inputs
+
+Mix text and vision content by passing a list of content parts. The Responses API unifies text and images into a single `input` array.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="Qwen/Qwen2.5-VL-7B-Instruct",
+ input=[
+ {
+ "role": "user",
+ "content": [
+ {"type": "input_text", "text": "what is in this image?"},
+ {
+ "type": "input_image",
+ "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+ },
+ ],
+ }
+ ],
+)
+
+print(response.output_text)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "Qwen/Qwen2.5-VL-7B-Instruct",
+ input: [
+ {
+ role: "user",
+ content: [
+ { type: "input_text", text: "what is in this image?" },
+ {
+ type: "input_image",
+ image_url:
+ "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
+ },
+ ],
+ },
+ ],
+});
+
+console.log(response.output_text);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "Qwen/Qwen2.5-VL-7B-Instruct",
+ "input": [
+ {
+ "role": "user",
+ "content": [
+ {"type": "input_text", "text": "what is in this image?"},
+ {
+ "type": "input_image",
+ "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
+ }
+ ]
+ }
+ ]
+ }'
+```
+
+
+
+
+
+### Multi-turn conversations
+
+Responses requests accept conversation history. Add `developer`, `system`, and `user` messages to control the assistant's behavior without managing chat state yourself.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="moonshotai/Kimi-K2-Instruct-0905:groq",
+ input=[
+ {"role": "developer", "content": "Talk like a pirate."},
+ {"role": "user", "content": "Are semicolons optional in JavaScript?"},
+ ],
+)
+
+print(response.output_text)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "moonshotai/Kimi-K2-Instruct-0905:groq",
+ input: [
+ { role: "developer", content: "Talk like a pirate." },
+ { role: "user", content: "Are semicolons optional in JavaScript?" },
+ ],
+});
+
+console.log(response.output_text);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
+ "input": [
+ {"role": "developer", "content": "Talk like a pirate."},
+ {"role": "user", "content": "Are semicolons optional in JavaScript?"}
+ ]
+ }'
+```
+
+
+
+
+
+## Advanced features
+
+Advanced features use the same request format.
+
+### Event-based streaming
+
+Set `stream=True` to receive incremental `response.*` events. Each event arrives as JSON, so you can render words as they stream in or monitor tool execution in real time.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+stream = client.responses.create(
+ model="moonshotai/Kimi-K2-Instruct-0905:groq",
+ input=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
+ stream=True,
+)
+
+for event in stream:
+ print(event)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const stream = await client.responses.create({
+ model: "moonshotai/Kimi-K2-Instruct-0905:groq",
+ input: [{ role: "user", content: "Say 'double bubble bath' ten times fast." }],
+ stream: true,
+});
+
+for await (const event of stream) {
+ console.log(event);
+}
+```
+
+
+
+
+
+```bash
+curl -N https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
+ "input": [
+ {"role": "user", "content": "Say \"double bubble bath\" ten times fast."}
+ ],
+ "stream": true
+ }'
+```
+
+
+
+
+
+### Tool calling and routing
+
+Add a `tools` array to let the model call your functions. The router handles the function calls and returns tool events.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+tools = [
+ {
+ "type": "function",
+ "name": "get_current_weather",
+ "description": "Get the current weather in a given location",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+ },
+ "required": ["location", "unit"],
+ },
+ }
+]
+
+response = client.responses.create(
+ model="moonshotai/Kimi-K2-Instruct-0905:groq",
+ tools=tools,
+ input="What is the weather like in Boston today?",
+ tool_choice="auto",
+)
+
+print(response)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const tools = [
+ {
+ type: "function",
+ name: "get_current_weather",
+ description: "Get the current weather in a given location",
+ parameters: {
+ type: "object",
+ properties: {
+ location: { type: "string", description: "The city and state, e.g. San Francisco, CA" },
+ unit: { type: "string", enum: ["celsius", "fahrenheit"] },
+ },
+ required: ["location", "unit"],
+ },
+ },
+];
+
+const response = await client.responses.create({
+ model: "moonshotai/Kimi-K2-Instruct-0905:groq",
+ tools,
+ input: "What is the weather like in Boston today?",
+ tool_choice: "auto",
+});
+
+console.log(response);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
+ "input": "What is the weather like in Boston today?",
+ "tool_choice": "auto",
+ "tools": [
+ {
+ "type": "function",
+ "name": "get_current_weather",
+ "description": "Get the current weather in a given location",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
+ },
+ "required": ["location", "unit"]
+ }
+ }
+ ]
+ }'
+```
+
+
+
+
+
+### Structured outputs
+
+Force the model to return JSON matching a schema by supplying a `response_format`. The Python SDK exposes a `.parse` helper that converts the response directly into your target type.
+
+> [!NOTE]
+> When calling `openai/gpt-oss-120b:groq` from JavaScript or raw HTTP, include a brief instruction to return JSON. Without it the model may emit markdown even when a schema is provided.
+
+
+
+
+
+```python
+from openai import OpenAI
+from pydantic import BaseModel
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+class CalendarEvent(BaseModel):
+ name: str
+ date: str
+ participants: list[str]
+
+response = client.responses.parse(
+ model="openai/gpt-oss-120b:groq",
+ input=[
+ {"role": "system", "content": "Extract the event information."},
+ {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
+ ],
+ text_format=CalendarEvent,
+)
+
+print(response.output_parsed)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "openai/gpt-oss-120b:groq",
+ instructions: "Return JSON that matches the CalendarEvent schema (fields name, date, participants).",
+ input: [
+ { role: "system", content: "Extract the event information." },
+ { role: "user", content: "Alice and Bob are going to a science fair on Friday." },
+ ],
+ response_format: {
+ type: "json_schema",
+ json_schema: {
+ name: "CalendarEvent",
+ schema: {
+ type: "object",
+ properties: {
+ name: { type: "string" },
+ date: { type: "string" },
+ participants: {
+ type: "array",
+ items: { type: "string" },
+ },
+ },
+ required: ["name", "date", "participants"],
+ additionalProperties: false,
+ },
+ strict: true,
+ },
+ },
+});
+
+const parsed = JSON.parse(response.output_text);
+console.log(parsed);
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "openai/gpt-oss-120b:groq",
+ "instructions": "Return JSON that matches the CalendarEvent schema (fields name, date, participants).",
+ "input": [
+ {"role": "system", "content": "Extract the event information."},
+ {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."}
+ ],
+ "response_format": {
+ "type": "json_schema",
+ "json_schema": {
+ "name": "CalendarEvent",
+ "schema": {
+ "type": "object",
+ "properties": {
+ "name": {"type": "string"},
+ "date": {"type": "string"},
+ "participants": {
+ "type": "array",
+ "items": {"type": "string"}
+ }
+ },
+ "required": ["name", "date", "participants"],
+ "additionalProperties": false
+ },
+ "strict": true
+ }
+ }
+ }'
+```
+
+
+
+
+
+### Remote MCP execution
+
+Remote MCP lets you call server-hosted tools that implement the Model Context Protocol. Provide the MCP server URL and allowed tools, and the Responses API handles the calls for you.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="moonshotai/Kimi-K2-Instruct-0905:groq",
+ input="how does tiktoken work?",
+ tools=[
+ {
+ "type": "mcp",
+ "server_label": "gitmcp",
+ "server_url": "https://gitmcp.io/openai/tiktoken",
+ "allowed_tools": ["search_tiktoken_documentation", "fetch_tiktoken_documentation"],
+ "require_approval": "never",
+ },
+ ],
+)
+
+for output in response.output:
+ print(output)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "moonshotai/Kimi-K2-Instruct-0905:groq",
+ input: "how does tiktoken work?",
+ tools: [
+ {
+ type: "mcp",
+ server_label: "gitmcp",
+ server_url: "https://gitmcp.io/openai/tiktoken",
+ allowed_tools: ["search_tiktoken_documentation", "fetch_tiktoken_documentation"],
+ require_approval: "never",
+ },
+ ],
+});
+
+for (const output of response.output) {
+ console.log(output);
+}
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
+ "input": "how does tiktoken work?",
+ "tools": [
+ {
+ "type": "mcp",
+ "server_label": "gitmcp",
+ "server_url": "https://gitmcp.io/openai/tiktoken",
+ "allowed_tools": ["search_tiktoken_documentation", "fetch_tiktoken_documentation"],
+ "require_approval": "never"
+ }
+ ]
+ }'
+```
+
+
+
+
+
+### Reasoning effort controls
+
+Some open-source reasoning models expose effort tiers. Pass `reasoning={"effort": "low" | "medium" | "high"}` to trade off latency and depth.
+
+
+
+
+
+```python
+from openai import OpenAI
+import os
+
+client = OpenAI(
+ base_url="https://router.huggingface.co/v1",
+ api_key=os.getenv("HF_TOKEN"),
+)
+
+response = client.responses.create(
+ model="deepseek-ai/DeepSeek-R1",
+ instructions="You are a helpful assistant.",
+ input="Say hello to the world.",
+ reasoning={"effort": "low"},
+)
+
+for i, item in enumerate(response.output):
+ print(f"Output #{i}: {item.type}", item.content)
+```
+
+
+
+
+
+```ts
+import OpenAI from "openai";
+
+const client = new OpenAI({
+ baseURL: "https://router.huggingface.co/v1",
+ apiKey: process.env.HF_TOKEN,
+});
+
+const response = await client.responses.create({
+ model: "deepseek-ai/DeepSeek-R1",
+ instructions: "You are a helpful assistant.",
+ input: "Say hello to the world.",
+ reasoning: { effort: "low" },
+});
+
+response.output.forEach((item, index) => {
+ console.log(`Output #${index}: ${item.type}`, item.content);
+});
+```
+
+
+
+
+
+```bash
+curl https://router.huggingface.co/v1/responses \
+ -H "Authorization: Bearer $HF_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "deepseek-ai/DeepSeek-R1",
+ "instructions": "You are a helpful assistant.",
+ "input": "Say hello to the world.",
+ "reasoning": {"effort": "low"}
+ }'
+```
+
+
+
+
+
+## API reference
+
+Read the official [OpenAI Responses reference](https://platform.openai.com/docs/api-reference/responses).