Cancel button in sandbox; fixes #566 by andrewmusselman · Pull Request #568 · The-AI-Alliance/gofannon

andrewmusselman · 2026-02-07T05:30:23Z

Sandbox Cancel Button with Elapsed Timer

Fixes #566

Summary

Adds the ability to cancel a running agent from the Sandbox screen. Users can click a Cancel button to immediately stop waiting for results, and an elapsed timer shows how long the agent has been running.

Changes

Backend — `routes.py`

Thread-based execution: Agent code now runs in a separate thread via run_in_executor(), keeping the main asyncio event loop free to handle other HTTP requests (including cancel) while the agent runs.
Cancel endpoint: New POST /agents/cancel-run endpoint. Sets a threading.Event that the main event loop polls every 500ms. When detected, returns HTTP 499 immediately.
Disconnect detection: Main event loop also polls req.is_disconnected() every 500ms, so navigating away from the page triggers cancellation automatically.
Previous run cancellation: If a user starts a new run while one is already in progress, the previous run is cancelled automatically.
Module-level state: _cancel_events: Dict[str, threading.Event] keyed by user ID tracks active runs.

Frontend — `agentService.js`

Signal parameter: runCodeInSandbox() now accepts an AbortController signal, passed to the underlying fetch() call.
Cancel method: New cancelRun() fire-and-forget method calls the cancel endpoint with keepalive: true so the request survives page navigation and component unmount.

Frontend — `SandboxScreen.jsx`

Cancel button: Red outlined button with StopIcon, visible only while the agent is running.
Elapsed timer: Displays running time in M:SS format, updated every second.
AbortController lifecycle: Created on run, aborted on cancel. Passed to agentService.runCodeInSandbox().
Unmount cleanup: Component teardown aborts the fetch, calls cancelRun(), and clears the timer interval.
Error handling: AbortError caught and displayed as "Agent run was cancelled."

Architecture

User clicks Cancel
    │
    ├─► Frontend: AbortController.abort()  →  kills client-side fetch instantly
    │
    └─► Frontend: agentService.cancelRun()  →  POST /agents/cancel-run
                                                       │
                                                       ▼
                                              Sets threading.Event
                                                       │
                                                       ▼
                                              Main event loop detects it
                                              within 500ms, returns 499

Files Modified

File	Changes
`routes.py`	Thread-based execution, cancel endpoint, disconnect detection
`agentService.js`	Signal parameter, `cancelRun()` method with `keepalive: true`
`SandboxScreen.jsx`	Cancel button, elapsed timer, AbortController lifecycle

Testing

Run an agent, click Cancel → see "Agent run was cancelled", timer stops.
Run an agent, navigate away → agent cancels via unmount cleanup.
Run an agent, click Run again → previous run auto-cancelled.
Cancel, then immediately start a new run → new run works normally.
Verify docker compose logs -f api shows cancel log lines:
- >>> cancel-run hit for user local-dev-user
- Cancel detected for user local-dev-user, returning 499

Type of Change

New feature (non-breaking change that adds functionality)

Test Execution

All existing tests pass locally

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code where necessary (particularly complex areas)
My changes generate no new warnings
I have checked that there are no merge conflicts

Screenshots (if applicable)

Known Limitation: Background Thread Continues

Cancellation is cooperative, not preemptive. When a user cancels:

The server responds with HTTP 499 immediately — the user is unblocked.
The event loop is freed — other requests are handled normally.
However, the background thread running the agent continues to completion. Its result is discarded.

This means that in-flight LLM API calls and database operations will finish even after cancellation. The orphaned thread runs to completion and then dies naturally. This is a Python limitation — threads cannot be forcibly terminated.

Practical impact is low for typical usage:

The LLM API call has already been sent; cancelling can't reclaim that cost.
Database reads (namespace scanning) are lightweight.
The thread holds no locks and writes no results after the cancel event is set.
Starting a new run works immediately regardless of the orphaned thread.

Future Work: True Process Termination via Multiprocessing

To actually halt in-flight work, the agent would need to run in a child process instead of a thread, since processes can be killed via process.terminate() / process.kill().

What would change

Replace threading + run_in_executor with multiprocessing.Process
- Spawn a child process for each agent run.
- Cancel via process.terminate() (sends SIGTERM) or process.kill() (sends SIGKILL).
- Return results via multiprocessing.Queue.
Bootstrap connections in the child process
- multiprocessing serializes (pickles) arguments to send to the child. Live objects like DatabaseService, HTTP clients, and closures cannot be pickled.
- Instead of passing db, pass primitive config values (CouchDB URL, credentials) and have the child process create its own DatabaseService instance.
- Same for httpx.AsyncClient, LLM API clients, etc.
Make _execute_agent_code self-contained
- Must import and initialize all dependencies from scratch — no shared state with the parent.
- exec_globals would be built inside the child process.
Cleanup and resource management
- process.terminate() + process.join(timeout) on cancel, with process.kill() as fallback.
- Handle zombie process cleanup.
- Consider a process pool to limit concurrent agent runs.

Estimated scope

Medium effort (half-day to one day). The main risk is subtle bugs from having independent database connections in the child process, or missing dependencies that exec_globals injects. Testing should cover:

Normal run completes and returns results correctly.
Cancel terminates the process and returns 499.
Rapid cancel-and-rerun doesn't leak processes.
Database state is consistent after cancellation mid-write.

When to prioritize this

The current threading approach is sufficient for single-user / low-concurrency usage. Multiprocessing becomes important when:

Multiple users run agents concurrently and orphaned threads consume significant resources.
LLM API costs are high enough that reclaiming cancelled calls matters (would require upstream API cancellation support as well).
Agent runs involve long-running write operations that should be rolled back on cancel.

By submitting this PR, I confirm that I have read and agree to follow the project's Code of Conduct and Contributing Guidelines.

Signed-off-by: Andrew Musselman <akm@apache.org>

rawkintrevo · 2026-02-08T18:55:02Z

Thanks for the contrib @andrewmusselman ,

This means that in-flight LLM API calls and database operations will finish even after cancellation. The orphaned thread runs to completion and then dies naturally. This is a Python limitation — threads cannot be forcibly terminated.

From the description it makes it sound like a request can't be cancelled, and this is untrue

https://docs.litellm.ai/docs/response_api#cancel-a-response

The cleaner implementation would be to get a response id for any query- save that then fire the canellation request, only falling back to the oprhaning if that doesn't work. I disagree with whatever said, 'meh, doesn't really matter anyway'

rawkintrevo · 2026-02-08T18:55:46Z

this also should have a tests with it

andrewmusselman · 2026-02-10T17:16:28Z

Thanks for the contrib @andrewmusselman ,

This means that in-flight LLM API calls and database operations will finish even after cancellation. The orphaned thread runs to completion and then dies naturally. This is a Python limitation — threads cannot be forcibly terminated.

From the description it makes it sound like a request can't be cancelled, and this is untrue

https://docs.litellm.ai/docs/response_api#cancel-a-response

The cleaner implementation would be to get a response id for any query- save that then fire the canellation request, only falling back to the oprhaning if that doesn't work. I disagree with whatever said, 'meh, doesn't really matter anyway'

Okay thanks, I'll see if we can get that implemented

Cancel button in sandbox; fixes #566

510ece5

Signed-off-by: Andrew Musselman <akm@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancel button in sandbox; fixes #566#568

Cancel button in sandbox; fixes #566#568
andrewmusselman wants to merge 1 commit intomainfrom
cancel-button-566

andrewmusselman commented Feb 7, 2026

Uh oh!

rawkintrevo commented Feb 8, 2026 •

edited

Loading

Uh oh!

rawkintrevo commented Feb 8, 2026

Uh oh!

andrewmusselman commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

andrewmusselman commented Feb 7, 2026

Sandbox Cancel Button with Elapsed Timer

Summary

Changes

Backend — routes.py

Frontend — agentService.js

Frontend — SandboxScreen.jsx

Architecture

Files Modified

Testing

Type of Change

Test Execution

Checklist

Screenshots (if applicable)

Known Limitation: Background Thread Continues

Future Work: True Process Termination via Multiprocessing

What would change

Estimated scope

When to prioritize this

Uh oh!

rawkintrevo commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rawkintrevo commented Feb 8, 2026

Uh oh!

andrewmusselman commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Backend — `routes.py`

Frontend — `agentService.js`

Frontend — `SandboxScreen.jsx`

rawkintrevo commented Feb 8, 2026 •

edited

Loading