-
Couldn't load subscription status.
- Fork 6
Fixed that dispatches are never retried, but forever logged. #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just one comment
| if not dispatch.started: | ||
| _logger.info( | ||
| "Giving up on dispatch %s, duration was exceeded.", | ||
| dispatch.id, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What duration?
From the code you try once, there is no duration. But maybe I don't see something?
15163ec to
597c18f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue where failed dispatches were never retried and instead caused an infinite logging loop.
Key changes:
- Removed the
FailedDispatchesRetrierin favor of aTimer-based retry loop. - Introduced a
pending_dispatchesmap to track and retry failed dispatches. - Updated the main loop to trigger retries on timer events and cleaned up retry state in
_handle_dispatch.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/frequenz/dispatch/_actor_dispatcher.py | Replaced retrier class with Timer, added pending_dispatches, updated retry flow. |
| RELEASE_NOTES.md | Updated bug-fix entry to describe the retry fix. |
Comments suppressed due to low confidence (2)
src/frequenz/dispatch/_actor_dispatcher.py:176
- [nitpick] The description of
retry_intervalis vague and still refers to "actors". Update it to clarify that this interval controls how often failed dispatches are retried via the timer.
retry_interval: How long to wait until trying to start failed actors again.
src/frequenz/dispatch/_actor_dispatcher.py:187
- [nitpick] The new retry logic around
pending_dispatchesand the timer-based loop lacks direct unit tests. Consider adding tests that simulate a dispatch failure and verify it gets retried after the configured interval and removed frompending_dispatches.
self._pending_dispatches: dict[int, Dispatch] = {}
597c18f to
d973fb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that you retry dispatch that just started?
Is that correct approach?
| keys = list(self._pending_dispatches.keys()) | ||
| for identity in keys: | ||
| dispatch = self._pending_dispatches[identity] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please remove dispatch from self._pending_dispatches here, not in self._handle_dispatch.
- It will be easier to read and more straightforward
- Removing dispatch from
_pending_dispatches, at the beginning of method calledhandle_dispatchis confusing :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also remove it here, but it is part of the design that handle dispatch removes it because if an updated dispatch instruction comes in, we need to erase the old pending one to not try to repeat with old outdated dispatches
Yes of course, I want to only retry dispatches that started. |
| self._actors: dict[int, ActorDispatcher.ActorAndChannel] = {} | ||
|
|
||
| self._retrier = ActorDispatcher.FailedDispatchesRetrier(retry_interval) | ||
| self._pending_dispatches: dict[int, Dispatch] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oww I god it!
Than Maybe rename it to _failed_dispatches? pending feels like they just started and didn't finished, yet.
d973fb6 to
28211d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also LGTM.
| ``` | ||
| """ | ||
|
|
||
| class FailedDispatchesRetrier(BackgroundService): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So satisfying to see this class go away... 🤣
| Args: | ||
| dispatch: The dispatch to handle. | ||
| """ | ||
| identity = self._dispatch_identity(dispatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be good to put a comment here about what you replied to Ela (or in the docstring), I agree at first sight it might look not super obvious wht removing dispatches from the pending list here.
31805be to
1532d1c
Compare
Signed-off-by: Mathias L. Baumann <[email protected]>
1532d1c to
4894cfc
Compare
No description provided.