diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml index d39278584..b5f22afb5 100644 --- a/docs/inference-providers/_toctree.yml +++ b/docs/inference-providers/_toctree.yml @@ -19,6 +19,8 @@ title: Structured Outputs with LLMs - local: guides/function-calling title: Function Calling + - local: guides/responses-api + title: Responses API (beta) - local: guides/gpt-oss title: How to use OpenAI gpt-oss - local: guides/image-editor diff --git a/docs/inference-providers/guides/responses-api.md b/docs/inference-providers/guides/responses-api.md new file mode 100644 index 000000000..cb8422c21 --- /dev/null +++ b/docs/inference-providers/guides/responses-api.md @@ -0,0 +1,813 @@ +# Responses API (beta) + +The Responses API (from OpenAI) provides a unified interface for model interactions with Hugging Face Inference Providers. Use your existing OpenAI SDKs to access features like multi-provider routing, event streaming, structured outputs, and Remote MCP tools. + +> [!TIP] +> This guide assumes you have a Hugging Face account and access token. You can create a free account at [huggingface.co](https://huggingface.co) and get your token from your [settings page](https://huggingface.co/settings/tokens). + +## Why build with the Responses API? + +The Responses API provides a unified interface built for agentic apps. With it, you get: + +- **Built-in tool orchestration.** Invoke functions, server-side MCP tools, and schema-validated outputs without changing endpoints. +- **Event-driven streaming.** Receive semantic events such as `response.created`, `output_text.delta`, and `response.completed` to power incremental UIs. +- **Reasoning controls and structured outputs.** Dial up or down reasoning effort and require models to return schema-compliant JSON every time. + +## Prerequisites + +- A Hugging Face account with remaining Inference Providers credits (free tier available). +- A fine-grained [Hugging Face token](https://huggingface.co/settings/tokens) with “Make calls to Inference Providers” permission stored in `HF_TOKEN`. + +> [!TIP] +> All Inference Providers chat completion models should be compatible with the Responses API. You can browse available models on the [Inference Models page](https://huggingface.co/inference/models). + +## Configure your Responses client + +Install the OpenAI SDK for your language of choice before running the snippets below (`pip install openai` for Python or `npm install openai` for Node.js). If you prefer issuing raw HTTP calls, any standard tool such as `curl` will work as well. + + + + + +```python +import os +from openai import OpenAI + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="openai/gpt-oss-120b:groq", + instructions="You are a helpful assistant.", + input="Tell me a three-sentence bedtime story about a unicorn.", +) + +print(response.output_text) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "openai/gpt-oss-120b:groq", + instructions: "You are a helpful assistant.", + input: "Tell me a three-sentence bedtime story about a unicorn.", +}); + +console.log(response.output_text); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "openai/gpt-oss-120b:groq", + "instructions": "You are a helpful assistant.", + "input": "Tell me a three-sentence bedtime story about a unicorn." + }' +``` + + + + + + +> [!TIP] +> If you plan to use a specific provider, append it to the model id as `:` (for example `moonshotai/Kimi-K2-Instruct-0905:groq`). Otherwise, omit the suffix and let routing fall back to the default provider. + +## Core Response patterns + +### Plain text output + +For a single response message, pass a string as input. The Responses API returns both the full `response` object and a convenience `output_text` helper. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="moonshotai/Kimi-K2-Instruct-0905:groq", + instructions="You are a helpful assistant.", + input="Tell me a three sentence bedtime story about a unicorn.", +) + +print(response.output_text) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "moonshotai/Kimi-K2-Instruct-0905:groq", + instructions: "You are a helpful assistant.", + input: "Tell me a three sentence bedtime story about a unicorn.", +}); + +console.log(response.output_text); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2-Instruct-0905:groq", + "instructions": "You are a helpful assistant.", + "input": "Tell me a three sentence bedtime story about a unicorn." + }' +``` + + + + + +### Multimodal inputs + +Mix text and vision content by passing a list of content parts. The Responses API unifies text and images into a single `input` array. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="Qwen/Qwen2.5-VL-7B-Instruct", + input=[ + { + "role": "user", + "content": [ + {"type": "input_text", "text": "what is in this image?"}, + { + "type": "input_image", + "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg", + }, + ], + } + ], +) + +print(response.output_text) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "Qwen/Qwen2.5-VL-7B-Instruct", + input: [ + { + role: "user", + content: [ + { type: "input_text", text: "what is in this image?" }, + { + type: "input_image", + image_url: + "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg", + }, + ], + }, + ], +}); + +console.log(response.output_text); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "Qwen/Qwen2.5-VL-7B-Instruct", + "input": [ + { + "role": "user", + "content": [ + {"type": "input_text", "text": "what is in this image?"}, + { + "type": "input_image", + "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" + } + ] + } + ] + }' +``` + + + + + +### Multi-turn conversations + +Responses requests accept conversation history. Add `developer`, `system`, and `user` messages to control the assistant's behavior without managing chat state yourself. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="moonshotai/Kimi-K2-Instruct-0905:groq", + input=[ + {"role": "developer", "content": "Talk like a pirate."}, + {"role": "user", "content": "Are semicolons optional in JavaScript?"}, + ], +) + +print(response.output_text) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "moonshotai/Kimi-K2-Instruct-0905:groq", + input: [ + { role: "developer", content: "Talk like a pirate." }, + { role: "user", content: "Are semicolons optional in JavaScript?" }, + ], +}); + +console.log(response.output_text); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2-Instruct-0905:groq", + "input": [ + {"role": "developer", "content": "Talk like a pirate."}, + {"role": "user", "content": "Are semicolons optional in JavaScript?"} + ] + }' +``` + + + + + +## Advanced features + +Advanced features use the same request format. + +### Event-based streaming + +Set `stream=True` to receive incremental `response.*` events. Each event arrives as JSON, so you can render words as they stream in or monitor tool execution in real time. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +stream = client.responses.create( + model="moonshotai/Kimi-K2-Instruct-0905:groq", + input=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}], + stream=True, +) + +for event in stream: + print(event) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const stream = await client.responses.create({ + model: "moonshotai/Kimi-K2-Instruct-0905:groq", + input: [{ role: "user", content: "Say 'double bubble bath' ten times fast." }], + stream: true, +}); + +for await (const event of stream) { + console.log(event); +} +``` + + + + + +```bash +curl -N https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2-Instruct-0905:groq", + "input": [ + {"role": "user", "content": "Say \"double bubble bath\" ten times fast."} + ], + "stream": true + }' +``` + + + + + +### Tool calling and routing + +Add a `tools` array to let the model call your functions. The router handles the function calls and returns tool events. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +tools = [ + { + "type": "function", + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, + }, + "required": ["location", "unit"], + }, + } +] + +response = client.responses.create( + model="moonshotai/Kimi-K2-Instruct-0905:groq", + tools=tools, + input="What is the weather like in Boston today?", + tool_choice="auto", +) + +print(response) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const tools = [ + { + type: "function", + name: "get_current_weather", + description: "Get the current weather in a given location", + parameters: { + type: "object", + properties: { + location: { type: "string", description: "The city and state, e.g. San Francisco, CA" }, + unit: { type: "string", enum: ["celsius", "fahrenheit"] }, + }, + required: ["location", "unit"], + }, + }, +]; + +const response = await client.responses.create({ + model: "moonshotai/Kimi-K2-Instruct-0905:groq", + tools, + input: "What is the weather like in Boston today?", + tool_choice: "auto", +}); + +console.log(response); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2-Instruct-0905:groq", + "input": "What is the weather like in Boston today?", + "tool_choice": "auto", + "tools": [ + { + "type": "function", + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} + }, + "required": ["location", "unit"] + } + } + ] + }' +``` + + + + + +### Structured outputs + +Force the model to return JSON matching a schema by supplying a `response_format`. The Python SDK exposes a `.parse` helper that converts the response directly into your target type. + +> [!NOTE] +> When calling `openai/gpt-oss-120b:groq` from JavaScript or raw HTTP, include a brief instruction to return JSON. Without it the model may emit markdown even when a schema is provided. + + + + + +```python +from openai import OpenAI +from pydantic import BaseModel +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +class CalendarEvent(BaseModel): + name: str + date: str + participants: list[str] + +response = client.responses.parse( + model="openai/gpt-oss-120b:groq", + input=[ + {"role": "system", "content": "Extract the event information."}, + {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."}, + ], + text_format=CalendarEvent, +) + +print(response.output_parsed) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "openai/gpt-oss-120b:groq", + instructions: "Return JSON that matches the CalendarEvent schema (fields name, date, participants).", + input: [ + { role: "system", content: "Extract the event information." }, + { role: "user", content: "Alice and Bob are going to a science fair on Friday." }, + ], + response_format: { + type: "json_schema", + json_schema: { + name: "CalendarEvent", + schema: { + type: "object", + properties: { + name: { type: "string" }, + date: { type: "string" }, + participants: { + type: "array", + items: { type: "string" }, + }, + }, + required: ["name", "date", "participants"], + additionalProperties: false, + }, + strict: true, + }, + }, +}); + +const parsed = JSON.parse(response.output_text); +console.log(parsed); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "openai/gpt-oss-120b:groq", + "instructions": "Return JSON that matches the CalendarEvent schema (fields name, date, participants).", + "input": [ + {"role": "system", "content": "Extract the event information."}, + {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."} + ], + "response_format": { + "type": "json_schema", + "json_schema": { + "name": "CalendarEvent", + "schema": { + "type": "object", + "properties": { + "name": {"type": "string"}, + "date": {"type": "string"}, + "participants": { + "type": "array", + "items": {"type": "string"} + } + }, + "required": ["name", "date", "participants"], + "additionalProperties": false + }, + "strict": true + } + } + }' +``` + + + + + +### Remote MCP execution + +Remote MCP lets you call server-hosted tools that implement the Model Context Protocol. Provide the MCP server URL and allowed tools, and the Responses API handles the calls for you. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="moonshotai/Kimi-K2-Instruct-0905:groq", + input="how does tiktoken work?", + tools=[ + { + "type": "mcp", + "server_label": "gitmcp", + "server_url": "https://gitmcp.io/openai/tiktoken", + "allowed_tools": ["search_tiktoken_documentation", "fetch_tiktoken_documentation"], + "require_approval": "never", + }, + ], +) + +for output in response.output: + print(output) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "moonshotai/Kimi-K2-Instruct-0905:groq", + input: "how does tiktoken work?", + tools: [ + { + type: "mcp", + server_label: "gitmcp", + server_url: "https://gitmcp.io/openai/tiktoken", + allowed_tools: ["search_tiktoken_documentation", "fetch_tiktoken_documentation"], + require_approval: "never", + }, + ], +}); + +for (const output of response.output) { + console.log(output); +} +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moonshotai/Kimi-K2-Instruct-0905:groq", + "input": "how does tiktoken work?", + "tools": [ + { + "type": "mcp", + "server_label": "gitmcp", + "server_url": "https://gitmcp.io/openai/tiktoken", + "allowed_tools": ["search_tiktoken_documentation", "fetch_tiktoken_documentation"], + "require_approval": "never" + } + ] + }' +``` + + + + + +### Reasoning effort controls + +Some open-source reasoning models expose effort tiers. Pass `reasoning={"effort": "low" | "medium" | "high"}` to trade off latency and depth. + + + + + +```python +from openai import OpenAI +import os + +client = OpenAI( + base_url="https://router.huggingface.co/v1", + api_key=os.getenv("HF_TOKEN"), +) + +response = client.responses.create( + model="deepseek-ai/DeepSeek-R1", + instructions="You are a helpful assistant.", + input="Say hello to the world.", + reasoning={"effort": "low"}, +) + +for i, item in enumerate(response.output): + print(f"Output #{i}: {item.type}", item.content) +``` + + + + + +```ts +import OpenAI from "openai"; + +const client = new OpenAI({ + baseURL: "https://router.huggingface.co/v1", + apiKey: process.env.HF_TOKEN, +}); + +const response = await client.responses.create({ + model: "deepseek-ai/DeepSeek-R1", + instructions: "You are a helpful assistant.", + input: "Say hello to the world.", + reasoning: { effort: "low" }, +}); + +response.output.forEach((item, index) => { + console.log(`Output #${index}: ${item.type}`, item.content); +}); +``` + + + + + +```bash +curl https://router.huggingface.co/v1/responses \ + -H "Authorization: Bearer $HF_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "deepseek-ai/DeepSeek-R1", + "instructions": "You are a helpful assistant.", + "input": "Say hello to the world.", + "reasoning": {"effort": "low"} + }' +``` + + + + + +## API reference + +Read the official [OpenAI Responses reference](https://platform.openai.com/docs/api-reference/responses).