Conversation
5745f3a to
019cc08
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds client-side retry logic with idempotency keys and server-side caching of responses to prevent duplicate executions.
- Introduces a
ResponseManagerand anhandle_request_idmiddleware inserver.pyto replay cached responses for repeated requests. - Refactors the remote runtime in
remote.pyto useaiohttpwith exponential backoff retry, UUID-based idempotency headers, and async I/O.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/swerex/server.py | Added ResponseManager and handle_request_id middleware to cache and replay request responses. |
| src/swerex/runtime/remote.py | Switched to aiohttp, implemented retry logic with backoff, idempotency headers, and async methods. |
Comments suppressed due to low confidence (2)
src/swerex/server.py:54
- Accessing and modifying shared state in
ResponseManagerwithout synchronization can cause race conditions under concurrent requests. Consider using an asyncio.Lock or thread-safe structure.
def get_response(self, request_id):
src/swerex/runtime/remote.py:5
- The
sysimport is unused and can be removed to clean up the module.
import sys
|
This all looks good to me! Generally, we start one deployment per swe-agent instance, so the requests should be synchronous, so I don't think we necessarily need to generalize the caching (though it seems easy enough to do with what you've built so far already!). I've looked through the changes and couldn't find an immediate problem, but this definitely doesn't run for me at the moment: If I'm on pytest tests/test_execution.py::test_server_aliveHowever, on this branch, it first fails, and then cleanup takes half a minute: I can continue to investigate that later, but would also greatly appreciate it if you could take a look |
|
|
|
I added two commits here: #212 that seem to make this work, but it's very very slow on the CI runner. Not sure what's going on there... |
for more information, see https://pre-commit.ci
This mimicks the prior behavior of the requests library and makes it more resilient to the server closing client connections
for more information, see https://pre-commit.ci
3a29a62 to
fd6b595
Compare
|
This seems to have auto-closed because it had the |
|
Oh and I just realize that we don't actually use |
|
#234 should bring the retry logic into the normal |
|
Closing this because all the commits should now be included :) Thank you so much again ❤️ And sorry for all the delays! |
Overview
This PR adds HTTP retry logic which users often had to implement themselves.
It is built on top of #197, which uses makes remote runtime fully async with aiohttp.
Retry logic for server failures (server side and remote side)
Transient network errors cause requests to fail because requests aren't retried.
The naive solution is to retry requests to the SWE-ReX server. However, simply adding retries can lead to idempotency errors, such as the following scenario:
We implemented retry in this PR where each request has an associated uuid (idempotency key) generated on the client side, which are sent over in the headers of the request. Key + Response pairs are cached on the server side using a
ResponseManager. Retries for the same request uses the same uuid.This allows for retries to succeed but commands only run once:
Edge case: Concurrent Clients
This fix assumes there is only one client that issues requests to the server at a time. As such, our
ResponseManageronly caches by saving the latest executed Key + Response pair. If multiple concurrent clients is a use case SWE-Rex often sees, we can add a follow-up with a more complexResponseManagerthat handles complex caching.Testing
To test this behavior, we manually injected random network errors and ensured that the implementation works.