Conversation
Signed-off-by: Andrew Musselman <akm@apache.org>
|
Thanks for the contrib @andrewmusselman ,
From the description it makes it sound like a request can't be cancelled, and this is untrue https://docs.litellm.ai/docs/response_api#cancel-a-response The cleaner implementation would be to get a response id for any query- save that then fire the canellation request, only falling back to the oprhaning if that doesn't work. I disagree with whatever said, 'meh, doesn't really matter anyway' |
|
this also should have a tests with it |
Okay thanks, I'll see if we can get that implemented |
Sandbox Cancel Button with Elapsed Timer
Fixes #566
Summary
Adds the ability to cancel a running agent from the Sandbox screen. Users can click a Cancel button to immediately stop waiting for results, and an elapsed timer shows how long the agent has been running.
Changes
Backend —
routes.pyrun_in_executor(), keeping the main asyncio event loop free to handle other HTTP requests (including cancel) while the agent runs.POST /agents/cancel-runendpoint. Sets athreading.Eventthat the main event loop polls every 500ms. When detected, returns HTTP 499 immediately.req.is_disconnected()every 500ms, so navigating away from the page triggers cancellation automatically._cancel_events: Dict[str, threading.Event]keyed by user ID tracks active runs.Frontend —
agentService.jsrunCodeInSandbox()now accepts anAbortControllersignal, passed to the underlyingfetch()call.cancelRun()fire-and-forget method calls the cancel endpoint withkeepalive: trueso the request survives page navigation and component unmount.Frontend —
SandboxScreen.jsxStopIcon, visible only while the agent is running.M:SSformat, updated every second.agentService.runCodeInSandbox().cancelRun(), and clears the timer interval.AbortErrorcaught and displayed as "Agent run was cancelled."Architecture
Files Modified
routes.pyagentService.jscancelRun()method withkeepalive: trueSandboxScreen.jsxTesting
docker compose logs -f apishows cancel log lines:>>> cancel-run hit for user local-dev-userCancel detected for user local-dev-user, returning 499Type of Change
Test Execution
Checklist
Screenshots (if applicable)
Known Limitation: Background Thread Continues
Cancellation is cooperative, not preemptive. When a user cancels:
This means that in-flight LLM API calls and database operations will finish even after cancellation. The orphaned thread runs to completion and then dies naturally. This is a Python limitation — threads cannot be forcibly terminated.
Practical impact is low for typical usage:
Future Work: True Process Termination via Multiprocessing
To actually halt in-flight work, the agent would need to run in a child process instead of a thread, since processes can be killed via
process.terminate()/process.kill().What would change
Replace
threading+run_in_executorwithmultiprocessing.Processprocess.terminate()(sends SIGTERM) orprocess.kill()(sends SIGKILL).multiprocessing.Queue.Bootstrap connections in the child process
multiprocessingserializes (pickles) arguments to send to the child. Live objects likeDatabaseService, HTTP clients, and closures cannot be pickled.db, pass primitive config values (CouchDB URL, credentials) and have the child process create its ownDatabaseServiceinstance.httpx.AsyncClient, LLM API clients, etc.Make
_execute_agent_codeself-containedexec_globalswould be built inside the child process.Cleanup and resource management
process.terminate()+process.join(timeout)on cancel, withprocess.kill()as fallback.Estimated scope
Medium effort (half-day to one day). The main risk is subtle bugs from having independent database connections in the child process, or missing dependencies that
exec_globalsinjects. Testing should cover:When to prioritize this
The current threading approach is sufficient for single-user / low-concurrency usage. Multiprocessing becomes important when:
By submitting this PR, I confirm that I have read and agree to follow the project's Code of Conduct and Contributing Guidelines.