Skip to content

Commit 6bb0bc0

Browse files
committed
feat(openai-native): background mode + auto-resume and poll fallback
Enable OpenAI Responses background mode with resilient streaming for GPT‑5 Pro and any model flagged via metadata. Key changes: - Background mode enablement • Auto-enable for models with info.backgroundMode === true (e.g., gpt-5-pro-2025-10-06) defined in [packages/types/src/providers/openai.ts](packages/types/src/providers/openai.ts). • Also respects manual override (openAiNativeBackgroundMode) from ProviderSettings/ApiHandlerOptions. - Request shape (Responses API) • background:true, stream:true, store:true set in [OpenAiNativeHandler.buildRequestBody()](src/api/providers/openai-native.ts:224). - Streaming UX and status events • New ApiStreamStatusChunk in [src/api/transform/stream.ts](src/api/transform/stream.ts) with statuses: queued, in_progress, completed, failed, canceled, reconnecting, polling. • Provider emits status chunks in SDK + SSE paths via [OpenAiNativeHandler.processEvent()](src/api/providers/openai-native.ts:1100) and [OpenAiNativeHandler.handleStreamResponse()](src/api/providers/openai-native.ts:651). • UI spinner shows background lifecycle labels in [webview-ui/src/components/chat/ChatRow.tsx](webview-ui/src/components/chat/ChatRow.tsx) using [webview-ui/src/utils/backgroundStatus.ts](webview-ui/src/utils/backgroundStatus.ts). - Resilience: auto-resume + poll fallback • On stream drop for background tasks, attempt SSE resume using response.id and last sequence_number with exponential backoff in [OpenAiNativeHandler.attemptResumeOrPoll()](src/api/providers/openai-native.ts:1215). • If resume fails, poll GET /v1/responses/{id} every 2s until terminal and synthesize final output/usage. • Deduplicate resumed events via resumeCutoffSequence in [handleStreamResponse()](src/api/providers/openai-native.ts:737). - Settings (no new UI switch) • Added optional provider settings and ApiHandlerOptions: autoResume, resumeMaxRetries, resumeBaseDelayMs, pollIntervalMs, pollMaxMinutes in [packages/types/src/provider-settings.ts](packages/types/src/provider-settings.ts) and [src/shared/api.ts](src/shared/api.ts). - Cleanup • Removed VS Code contributes toggle for background mode; behavior now model-driven + programmatic override. - Tests • Provider: coverage for background status emission, auto-resume success, resume→poll fallback, non-background negative in [src/api/providers/__tests__/openai-native.spec.ts](src/api/providers/__tests__/openai-native.spec.ts). • Usage parity unchanged validated in [src/api/providers/__tests__/openai-native-usage.spec.ts](src/api/providers/__tests__/openai-native-usage.spec.ts). • UI: label mapping tests for background statuses in [webview-ui/src/utils/__tests__/backgroundStatus.spec.ts](webview-ui/src/utils/__tests__/backgroundStatus.spec.ts). Notes: - Aligns with TEMP_OPENAI_BACKGROUND_TASK_DOCS.DM: background requires store=true; supports streaming resume via response.id + sequence_number. - Default behavior unchanged for non-background models; no breaking changes.
1 parent 957b8d9 commit 6bb0bc0

File tree

14 files changed

+1350
-19
lines changed

14 files changed

+1350
-19
lines changed
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
Background mode
2+
===============
3+
4+
Run long running tasks asynchronously in the background.
5+
6+
Agents like [Codex](https://openai.com/index/introducing-codex/) and [Deep Research](https://openai.com/index/introducing-deep-research/) show that reasoning models can take several minutes to solve complex problems. Background mode enables you to execute long-running tasks on models like o3 and o1-pro reliably, without having to worry about timeouts or other connectivity issues.
7+
8+
Background mode kicks off these tasks asynchronously, and developers can poll response objects to check status over time. To start response generation in the background, make an API request with `background` set to `true`:
9+
10+
Generate a response in the background
11+
12+
```bash
13+
curl https://api.openai.com/v1/responses \
14+
-H "Content-Type: application/json" \
15+
-H "Authorization: Bearer $OPENAI_API_KEY" \
16+
-d '{
17+
"model": "o3",
18+
"input": "Write a very long novel about otters in space.",
19+
"background": true
20+
}'
21+
```
22+
23+
```javascript
24+
import OpenAI from "openai";
25+
const client = new OpenAI();
26+
27+
const resp = await client.responses.create({
28+
model: "o3",
29+
input: "Write a very long novel about otters in space.",
30+
background: true,
31+
});
32+
33+
console.log(resp.status);
34+
```
35+
36+
```python
37+
from openai import OpenAI
38+
39+
client = OpenAI()
40+
41+
resp = client.responses.create(
42+
model="o3",
43+
input="Write a very long novel about otters in space.",
44+
background=True,
45+
)
46+
47+
print(resp.status)
48+
```
49+
50+
Polling background responses
51+
----------------------------
52+
53+
To check the status of background requests, use the GET endpoint for Responses. Keep polling while the request is in the queued or in\_progress state. When it leaves these states, it has reached a final (terminal) state.
54+
55+
Retrieve a response executing in the background
56+
57+
```bash
58+
curl https://api.openai.com/v1/responses/resp_123 \
59+
-H "Content-Type: application/json" \
60+
-H "Authorization: Bearer $OPENAI_API_KEY"
61+
```
62+
63+
```javascript
64+
import OpenAI from "openai";
65+
const client = new OpenAI();
66+
67+
let resp = await client.responses.create({
68+
model: "o3",
69+
input: "Write a very long novel about otters in space.",
70+
background: true,
71+
});
72+
73+
while (resp.status === "queued" || resp.status === "in_progress") {
74+
console.log("Current status: " + resp.status);
75+
await new Promise(resolve => setTimeout(resolve, 2000)); // wait 2 seconds
76+
resp = await client.responses.retrieve(resp.id);
77+
}
78+
79+
console.log("Final status: " + resp.status + "\nOutput:\n" + resp.output_text);
80+
```
81+
82+
```python
83+
from openai import OpenAI
84+
from time import sleep
85+
86+
client = OpenAI()
87+
88+
resp = client.responses.create(
89+
model="o3",
90+
input="Write a very long novel about otters in space.",
91+
background=True,
92+
)
93+
94+
while resp.status in {"queued", "in_progress"}:
95+
print(f"Current status: {resp.status}")
96+
sleep(2)
97+
resp = client.responses.retrieve(resp.id)
98+
99+
print(f"Final status: {resp.status}\nOutput:\n{resp.output_text}")
100+
```
101+
102+
Cancelling a background response
103+
--------------------------------
104+
105+
You can also cancel an in-flight response like this:
106+
107+
Cancel an ongoing response
108+
109+
```bash
110+
curl -X POST https://api.openai.com/v1/responses/resp_123/cancel \
111+
-H "Content-Type: application/json" \
112+
-H "Authorization: Bearer $OPENAI_API_KEY"
113+
```
114+
115+
```javascript
116+
import OpenAI from "openai";
117+
const client = new OpenAI();
118+
119+
const resp = await client.responses.cancel("resp_123");
120+
121+
console.log(resp.status);
122+
```
123+
124+
```python
125+
from openai import OpenAI
126+
client = OpenAI()
127+
128+
resp = client.responses.cancel("resp_123")
129+
130+
print(resp.status)
131+
```
132+
133+
Cancelling twice is idempotent - subsequent calls simply return the final `Response` object.
134+
135+
Streaming a background response
136+
-------------------------------
137+
138+
You can create a background Response and start streaming events from it right away. This may be helpful if you expect the client to drop the stream and want the option of picking it back up later. To do this, create a Response with both `background` and `stream` set to `true`. You will want to keep track of a "cursor" corresponding to the `sequence_number` you receive in each streaming event.
139+
140+
Currently, the time to first token you receive from a background response is higher than what you receive from a synchronous one. We are working to reduce this latency gap in the coming weeks.
141+
142+
Generate and stream a background response
143+
144+
```bash
145+
curl https://api.openai.com/v1/responses \
146+
-H "Content-Type: application/json" \
147+
-H "Authorization: Bearer $OPENAI_API_KEY" \
148+
-d '{
149+
"model": "o3",
150+
"input": "Write a very long novel about otters in space.",
151+
"background": true,
152+
"stream": true
153+
}'
154+
155+
// To resume:
156+
curl "https://api.openai.com/v1/responses/resp_123?stream=true&starting_after=42" \
157+
-H "Content-Type: application/json" \
158+
-H "Authorization: Bearer $OPENAI_API_KEY"
159+
```
160+
161+
```javascript
162+
import OpenAI from "openai";
163+
const client = new OpenAI();
164+
165+
const stream = await client.responses.create({
166+
model: "o3",
167+
input: "Write a very long novel about otters in space.",
168+
background: true,
169+
stream: true,
170+
});
171+
172+
let cursor = null;
173+
for await (const event of stream) {
174+
console.log(event);
175+
cursor = event.sequence_number;
176+
}
177+
178+
// If the connection drops, you can resume streaming from the last cursor (SDK support coming soon):
179+
// const resumedStream = await client.responses.stream(resp.id, { starting_after: cursor });
180+
// for await (const event of resumedStream) { ... }
181+
```
182+
183+
```python
184+
from openai import OpenAI
185+
186+
client = OpenAI()
187+
188+
# Fire off an async response but also start streaming immediately
189+
stream = client.responses.create(
190+
model="o3",
191+
input="Write a very long novel about otters in space.",
192+
background=True,
193+
stream=True,
194+
)
195+
196+
cursor = None
197+
for event in stream:
198+
print(event)
199+
cursor = event.sequence_number
200+
201+
# If your connection drops, the response continues running and you can reconnect:
202+
# SDK support for resuming the stream is coming soon.
203+
# for event in client.responses.stream(resp.id, starting_after=cursor):
204+
# print(event)
205+
```
206+
207+
Limits
208+
------
209+
210+
1. Background sampling requires `store=true`; stateless requests are rejected.
211+
2. To cancel a synchronous response, terminate the connection
212+
3. You can only start a new stream from a background response if you created it with `stream=true`.

packages/types/src/model.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ export const modelInfoSchema = z.object({
6363
supportsReasoningBudget: z.boolean().optional(),
6464
// Capability flag to indicate whether the model supports temperature parameter
6565
supportsTemperature: z.boolean().optional(),
66+
// When true, this model must be invoked using Responses background mode.
67+
// Providers should auto-enable background:true, stream:true, and store:true.
68+
backgroundMode: z.boolean().optional(),
6669
requiredReasoningBudget: z.boolean().optional(),
6770
supportsReasoningEffort: z.boolean().optional(),
6871
supportedParameters: z.array(modelParametersSchema).optional(),

packages/types/src/provider-settings.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,15 @@ const openAiNativeSchema = apiModelIdProviderModelSchema.extend({
297297
// OpenAI Responses API service tier for openai-native provider only.
298298
// UI should only expose this when the selected model supports flex/priority.
299299
openAiNativeServiceTier: serviceTierSchema.optional(),
300+
// Enable OpenAI Responses background mode when using Responses API.
301+
// Opt-in; defaults to false when omitted.
302+
openAiNativeBackgroundMode: z.boolean().optional(),
303+
// Background auto-resume/poll settings (no UI; plumbed via options)
304+
openAiNativeBackgroundAutoResume: z.boolean().optional(),
305+
openAiNativeBackgroundResumeMaxRetries: z.number().int().min(0).optional(),
306+
openAiNativeBackgroundResumeBaseDelayMs: z.number().int().min(0).optional(),
307+
openAiNativeBackgroundPollIntervalMs: z.number().int().min(0).optional(),
308+
openAiNativeBackgroundPollMaxMinutes: z.number().int().min(1).optional(),
300309
})
301310

302311
const mistralSchema = apiModelIdProviderModelSchema.extend({

packages/types/src/providers/openai.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ export const openAiNativeModels = {
5050
"GPT-5 Pro: a slow, reasoning-focused model built to tackle tough problems. Requests can take several minutes to finish. Responses API only; no streaming, so it may appear stuck until the reply is ready.",
5151
supportsVerbosity: true,
5252
supportsTemperature: false,
53+
backgroundMode: true,
5354
},
5455
"gpt-5-mini-2025-08-07": {
5556
maxTokens: 128000,

src/api/providers/__tests__/openai-native-usage.spec.ts

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,38 @@ describe("OpenAiNativeHandler - normalizeUsage", () => {
344344
})
345345
})
346346

347+
it("should produce identical usage chunk when background mode is enabled", () => {
348+
const usage = {
349+
input_tokens: 120,
350+
output_tokens: 60,
351+
cache_creation_input_tokens: 10,
352+
cache_read_input_tokens: 30,
353+
}
354+
355+
const baselineHandler = new OpenAiNativeHandler({
356+
openAiNativeApiKey: "test-key",
357+
apiModelId: "gpt-5-pro-2025-10-06",
358+
})
359+
const backgroundHandler = new OpenAiNativeHandler({
360+
openAiNativeApiKey: "test-key",
361+
apiModelId: "gpt-5-pro-2025-10-06",
362+
openAiNativeBackgroundMode: true,
363+
})
364+
365+
const baselineUsage = (baselineHandler as any).normalizeUsage(usage, baselineHandler.getModel())
366+
const backgroundUsage = (backgroundHandler as any).normalizeUsage(usage, backgroundHandler.getModel())
367+
368+
expect(baselineUsage).toMatchObject({
369+
type: "usage",
370+
inputTokens: 120,
371+
outputTokens: 60,
372+
cacheWriteTokens: 10,
373+
cacheReadTokens: 30,
374+
totalCost: expect.any(Number),
375+
})
376+
expect(backgroundUsage).toEqual(baselineUsage)
377+
})
378+
347379
describe("cost calculation", () => {
348380
it("should pass total input tokens to calculateApiCostOpenAI", () => {
349381
const usage = {

0 commit comments

Comments
 (0)