Use a detached timeout for pause completion in API#2238
Use a detached timeout for pause completion in API#2238matthewlouisbrockman wants to merge 7 commits intomainfrom
Conversation
PR SummaryMedium Risk Overview Written by Cursor Bugbot for commit 023fd3c. This will update automatically on new commits. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Error helper misused for HTTP 202 success response
- Replaced the two HTTP 202 pause-in-progress responses with direct
c.JSONsuccess payloads so they no longer add Gin context errors to logs.
- Replaced the two HTTP 202 pause-in-progress responses with direct
- ✅ Fixed: Removed
WithoutCancelbreaks cache invalidation for non-API callers- Restored
context.WithoutCancel(ctx)forsnapshotCache.InvalidateinpauseSandboxso Redis cache invalidation still runs even if the parent context is canceled.
- Restored
Or push these changes by commenting:
@cursor push aa12661f89
Preview (aa12661f89)
diff --git a/packages/api/internal/handlers/sandbox_pause.go b/packages/api/internal/handlers/sandbox_pause.go
--- a/packages/api/internal/handlers/sandbox_pause.go
+++ b/packages/api/internal/handlers/sandbox_pause.go
@@ -27,6 +27,7 @@
const (
pauseRequestTimeout = 15 * time.Minute
pauseRequestWaitTimeout = 70 * time.Second
+ pauseInProgressMessage = "Pause is still in progress. Check the sandbox info endpoint for the latest status."
)
type pauseRequestResult struct {
@@ -79,12 +80,18 @@
return
case <-waitTimer.C:
- a.sendAPIStoreError(c, http.StatusAccepted, "Pause is still in progress. Check the sandbox info endpoint for the latest status.")
+ c.JSON(http.StatusAccepted, gin.H{
+ "code": int32(http.StatusAccepted),
+ "message": pauseInProgressMessage,
+ })
return
case <-ctx.Done():
if errors.Is(context.Cause(ctx), sharedmiddleware.ErrRequestTimeout) {
- a.sendAPIStoreError(c, http.StatusAccepted, "Pause is still in progress. Check the sandbox info endpoint for the latest status.")
+ c.JSON(http.StatusAccepted, gin.H{
+ "code": int32(http.StatusAccepted),
+ "message": pauseInProgressMessage,
+ })
}
return
diff --git a/packages/api/internal/orchestrator/pause_instance.go b/packages/api/internal/orchestrator/pause_instance.go
--- a/packages/api/internal/orchestrator/pause_instance.go
+++ b/packages/api/internal/orchestrator/pause_instance.go
@@ -69,7 +69,7 @@
return fmt.Errorf("error pausing sandbox: %w", err)
}
- o.snapshotCache.Invalidate(ctx, sbx.SandboxID)
+ o.snapshotCache.Invalidate(context.WithoutCancel(ctx), sbx.SandboxID)
return nil
}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f17d2610e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| func sendPauseInProgressResponse(c *gin.Context) { | ||
| c.JSON(http.StatusAccepted, gin.H{ |
There was a problem hiding this comment.
Update pause API contract for 202 in-progress responses
Returning 202 Accepted here introduces a new success path that is not represented in the generated API contract/client (PostSandboxesSandboxIDPauseResponse and ParsePostSandboxesSandboxIDPauseResponse only model 401/404/409/500, with callers commonly expecting 204). When pause runs longer than 60s or the request context is canceled, clients generated from the current spec will treat this as an unexpected status and may surface false failures even though the pause continues in the background.
Useful? React with 👍 / 👎.
|
|
||
| func sendPauseInProgressResponse(c *gin.Context) { | ||
| c.JSON(http.StatusAccepted, gin.H{ | ||
| "code": int32(http.StatusAccepted), | ||
| "message": "Pause is still in progress. Check the sandbox info endpoint for the latest status.", | ||
| }) |
There was a problem hiding this comment.
🔴 The new 202 Accepted response from sendPauseInProgressResponse is not defined in the OpenAPI spec for POST /sandboxes/{sandboxID}/pause, which only declares 204, 409, 404, 401, and 500. SDK clients generated from the spec (Go, Python, JS) have no typed handling for 202 and may treat it as an unexpected response or assertion failure. The spec needs a 202 response entry added, and integration tests that assert StatusCode == 204 will fail whenever pause takes longer than 60 seconds.
Extended reasoning...
What the bug is and how it manifests
The PR introduces sendPauseInProgressResponse which returns HTTP 202 Accepted with a JSON body {"code": 202, "message": "..."} when pause takes longer than 60 seconds or the client disconnects. However, the OpenAPI spec at spec/openapi.yml only defines the following responses for POST /sandboxes/{sandboxID}/pause: 204 (success), 409 (conflict), 404 (not found), 401 (unauthorized), and 500 (internal error). A 202 response is completely absent from the spec.
The specific code path that triggers it
In PostSandboxesSandboxIDPause (sandbox_pause.go:35-40), a goroutine runs pauseSandboxRequest with a 15-minute timeout. The outer handler then selects on a 60-second timer (pauseRequestWaitTimeout) or ctx.Done(). If pause takes longer than 60 seconds or the HTTP client disconnects early, sendPauseInProgressResponse is called and writes HTTP 202. This is a new code path introduced by this PR.
Why existing code doesn't prevent it
The auto-generated server code (api.gen.go) is derived from the OpenAPI spec. Because the spec was not updated to include 202, the generated client type PostSandboxesSandboxIDPauseResponse has no JSON202 field — only JSON401, JSON404, JSON409, and JSON500. The Go client's ParsePostSandboxesSandboxIDPauseResponse switch statement falls through to a default case for 202, leaving only raw bytes stored. Other SDKs generated from the same spec (Python, JS) may throw exceptions or return error states when encountering the undocumented 202.
What the impact would be
Any caller that checks response.StatusCode == 204 to determine success will incorrectly treat a valid in-progress pause (202) as a failure. The integration test sandbox_pause_test.go:29 does exactly this with require.Equal(t, http.StatusNoContent, resp.StatusCode()) — it will fail whenever a pause crosses the 60-second threshold. External SDK users who handle only documented response codes will get unexpected behavior on 202.
How to fix it
Add a 202 response entry to the POST /sandboxes/{sandboxID}/pause operation in spec/openapi.yml, defining the response body shape (with code and message fields). Then regenerate api.gen.go so the Go client gains a JSON202 field and the parser handles it correctly. Integration tests should also be updated to accept either 202 or 204 as valid success responses.
Step-by-step proof
- Client calls
POST /sandboxes/{id}/pauseon a sandbox whose snapshot takes 90 seconds. - The handler goroutine starts and runs
pauseSandboxRequestin the background with a 15-minute context. - After exactly 60 seconds (
pauseRequestWaitTimeout), thetime.Aftercase fires in the select. sendPauseInProgressResponsewritesHTTP 202with body{"code":202,"message":"Pause is still in progress..."}.- A Python SDK client generated from the spec receives 202 — not in its known responses — and raises an
ApiExceptionor similar, treating the pause as failed. - The integration test at
sandbox_pause_test.go:29callsrequire.Equal(t, http.StatusNoContent, resp.StatusCode())which fails becauseresp.StatusCode() == 202, not 204.


Use a separate timeout-bounded context for sandbox pause completion in the API using a 60s timeout so it fits within the existing timeouts . This allows the pause RPC and the follow-up build status update to finish even if the original request context is canceled by e.g. 70/80 second api or lb timeouts.
Since we give a 204 already for paused, when it times out but is still going we give a 202 (start) to indicate it's still pausing along with a message to check the sandbox's info.