Skip to content

Use a detached timeout for pause completion in API#2238

Draft
matthewlouisbrockman wants to merge 7 commits intomainfrom
pause-timeouts
Draft

Use a detached timeout for pause completion in API#2238
matthewlouisbrockman wants to merge 7 commits intomainfrom
pause-timeouts

Conversation

@matthewlouisbrockman
Copy link
Copy Markdown
Contributor

@matthewlouisbrockman matthewlouisbrockman commented Mar 27, 2026

Use a separate timeout-bounded context for sandbox pause completion in the API using a 60s timeout so it fits within the existing timeouts . This allows the pause RPC and the follow-up build status update to finish even if the original request context is canceled by e.g. 70/80 second api or lb timeouts.

Since we give a 204 already for paused, when it times out but is still going we give a 202 (start) to indicate it's still pausing along with a message to check the sandbox's info.

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 27, 2026

PR Summary

Medium Risk
Changes the POST /sandboxes/{sandboxID}/pause control flow to run pause work asynchronously and return 202 after 60s, which can affect client expectations and increase background work during load or deploy shutdown.

Overview
Updates sandbox pause to run the orchestrator pause flow in a detached, timeout-bounded background context (15m) while only waiting ~60s on the request; if pause isn’t finished (or the client/request context ends), the endpoint now returns 202 Accepted with a small JSON InProgress payload instead of blocking. The orchestrator side also applies a 15m timeout around the pause RPC/build-status completion, and the OpenAPI + generated clients are updated to model/parse the new 202 response.

Written by Cursor Bugbot for commit 023fd3c. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Error helper misused for HTTP 202 success response
    • Replaced the two HTTP 202 pause-in-progress responses with direct c.JSON success payloads so they no longer add Gin context errors to logs.
  • ✅ Fixed: Removed WithoutCancel breaks cache invalidation for non-API callers
    • Restored context.WithoutCancel(ctx) for snapshotCache.Invalidate in pauseSandbox so Redis cache invalidation still runs even if the parent context is canceled.

Create PR

Or push these changes by commenting:

@cursor push aa12661f89
Preview (aa12661f89)
diff --git a/packages/api/internal/handlers/sandbox_pause.go b/packages/api/internal/handlers/sandbox_pause.go
--- a/packages/api/internal/handlers/sandbox_pause.go
+++ b/packages/api/internal/handlers/sandbox_pause.go
@@ -27,6 +27,7 @@
 const (
 	pauseRequestTimeout     = 15 * time.Minute
 	pauseRequestWaitTimeout = 70 * time.Second
+	pauseInProgressMessage  = "Pause is still in progress. Check the sandbox info endpoint for the latest status."
 )
 
 type pauseRequestResult struct {
@@ -79,12 +80,18 @@
 
 		return
 	case <-waitTimer.C:
-		a.sendAPIStoreError(c, http.StatusAccepted, "Pause is still in progress. Check the sandbox info endpoint for the latest status.")
+		c.JSON(http.StatusAccepted, gin.H{
+			"code":    int32(http.StatusAccepted),
+			"message": pauseInProgressMessage,
+		})
 
 		return
 	case <-ctx.Done():
 		if errors.Is(context.Cause(ctx), sharedmiddleware.ErrRequestTimeout) {
-			a.sendAPIStoreError(c, http.StatusAccepted, "Pause is still in progress. Check the sandbox info endpoint for the latest status.")
+			c.JSON(http.StatusAccepted, gin.H{
+				"code":    int32(http.StatusAccepted),
+				"message": pauseInProgressMessage,
+			})
 		}
 
 		return

diff --git a/packages/api/internal/orchestrator/pause_instance.go b/packages/api/internal/orchestrator/pause_instance.go
--- a/packages/api/internal/orchestrator/pause_instance.go
+++ b/packages/api/internal/orchestrator/pause_instance.go
@@ -69,7 +69,7 @@
 		return fmt.Errorf("error pausing sandbox: %w", err)
 	}
 
-	o.snapshotCache.Invalidate(ctx, sbx.SandboxID)
+	o.snapshotCache.Invalidate(context.WithoutCancel(ctx), sbx.SandboxID)
 
 	return nil
 }

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f17d2610e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

func sendPauseInProgressResponse(c *gin.Context) {
c.JSON(http.StatusAccepted, gin.H{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Update pause API contract for 202 in-progress responses

Returning 202 Accepted here introduces a new success path that is not represented in the generated API contract/client (PostSandboxesSandboxIDPauseResponse and ParsePostSandboxesSandboxIDPauseResponse only model 401/404/409/500, with callers commonly expecting 204). When pause runs longer than 60s or the request context is canceled, clients generated from the current spec will treat this as an unexpected status and may surface false failures even though the pause continues in the background.

Useful? React with 👍 / 👎.

Comment on lines +35 to +40

func sendPauseInProgressResponse(c *gin.Context) {
c.JSON(http.StatusAccepted, gin.H{
"code": int32(http.StatusAccepted),
"message": "Pause is still in progress. Check the sandbox info endpoint for the latest status.",
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new 202 Accepted response from sendPauseInProgressResponse is not defined in the OpenAPI spec for POST /sandboxes/{sandboxID}/pause, which only declares 204, 409, 404, 401, and 500. SDK clients generated from the spec (Go, Python, JS) have no typed handling for 202 and may treat it as an unexpected response or assertion failure. The spec needs a 202 response entry added, and integration tests that assert StatusCode == 204 will fail whenever pause takes longer than 60 seconds.

Extended reasoning...

What the bug is and how it manifests

The PR introduces sendPauseInProgressResponse which returns HTTP 202 Accepted with a JSON body {"code": 202, "message": "..."} when pause takes longer than 60 seconds or the client disconnects. However, the OpenAPI spec at spec/openapi.yml only defines the following responses for POST /sandboxes/{sandboxID}/pause: 204 (success), 409 (conflict), 404 (not found), 401 (unauthorized), and 500 (internal error). A 202 response is completely absent from the spec.

The specific code path that triggers it

In PostSandboxesSandboxIDPause (sandbox_pause.go:35-40), a goroutine runs pauseSandboxRequest with a 15-minute timeout. The outer handler then selects on a 60-second timer (pauseRequestWaitTimeout) or ctx.Done(). If pause takes longer than 60 seconds or the HTTP client disconnects early, sendPauseInProgressResponse is called and writes HTTP 202. This is a new code path introduced by this PR.

Why existing code doesn't prevent it

The auto-generated server code (api.gen.go) is derived from the OpenAPI spec. Because the spec was not updated to include 202, the generated client type PostSandboxesSandboxIDPauseResponse has no JSON202 field — only JSON401, JSON404, JSON409, and JSON500. The Go client's ParsePostSandboxesSandboxIDPauseResponse switch statement falls through to a default case for 202, leaving only raw bytes stored. Other SDKs generated from the same spec (Python, JS) may throw exceptions or return error states when encountering the undocumented 202.

What the impact would be

Any caller that checks response.StatusCode == 204 to determine success will incorrectly treat a valid in-progress pause (202) as a failure. The integration test sandbox_pause_test.go:29 does exactly this with require.Equal(t, http.StatusNoContent, resp.StatusCode()) — it will fail whenever a pause crosses the 60-second threshold. External SDK users who handle only documented response codes will get unexpected behavior on 202.

How to fix it

Add a 202 response entry to the POST /sandboxes/{sandboxID}/pause operation in spec/openapi.yml, defining the response body shape (with code and message fields). Then regenerate api.gen.go so the Go client gains a JSON202 field and the parser handles it correctly. Integration tests should also be updated to accept either 202 or 204 as valid success responses.

Step-by-step proof

  1. Client calls POST /sandboxes/{id}/pause on a sandbox whose snapshot takes 90 seconds.
  2. The handler goroutine starts and runs pauseSandboxRequest in the background with a 15-minute context.
  3. After exactly 60 seconds (pauseRequestWaitTimeout), the time.After case fires in the select.
  4. sendPauseInProgressResponse writes HTTP 202 with body {"code":202,"message":"Pause is still in progress..."}.
  5. A Python SDK client generated from the spec receives 202 — not in its known responses — and raises an ApiException or similar, treating the pause as failed.
  6. The integration test at sandbox_pause_test.go:29 calls require.Equal(t, http.StatusNoContent, resp.StatusCode()) which fails because resp.StatusCode() == 202, not 204.

@matthewlouisbrockman matthewlouisbrockman marked this pull request as draft March 27, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants