Skip to content

Conversation

@Marenz
Copy link
Contributor

@Marenz Marenz commented Jun 5, 2025

No description provided.

Copilot AI review requested due to automatic review settings June 5, 2025 11:12
@Marenz Marenz requested a review from a team as a code owner June 5, 2025 11:12
@github-actions github-actions bot added part:docs Affects the documentation part:dispatcher Affects the high-level dispatcher interface labels Jun 5, 2025

This comment was marked as outdated.

Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just one comment

Comment on lines 327 to 331
if not dispatch.started:
_logger.info(
"Giving up on dispatch %s, duration was exceeded.",
dispatch.id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What duration?
From the code you try once, there is no duration. But maybe I don't see something?

@Marenz Marenz force-pushed the fix_forever_retrying branch from 15163ec to 597c18f Compare June 5, 2025 12:20
@Marenz Marenz requested a review from Copilot June 5, 2025 12:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where failed dispatches were never retried and instead caused an infinite logging loop.
Key changes:

  • Removed the FailedDispatchesRetrier in favor of a Timer-based retry loop.
  • Introduced a pending_dispatches map to track and retry failed dispatches.
  • Updated the main loop to trigger retries on timer events and cleaned up retry state in _handle_dispatch.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/frequenz/dispatch/_actor_dispatcher.py Replaced retrier class with Timer, added pending_dispatches, updated retry flow.
RELEASE_NOTES.md Updated bug-fix entry to describe the retry fix.
Comments suppressed due to low confidence (2)

src/frequenz/dispatch/_actor_dispatcher.py:176

  • [nitpick] The description of retry_interval is vague and still refers to "actors". Update it to clarify that this interval controls how often failed dispatches are retried via the timer.
retry_interval: How long to wait until trying to start failed actors again.

src/frequenz/dispatch/_actor_dispatcher.py:187

  • [nitpick] The new retry logic around pending_dispatches and the timer-based loop lacks direct unit tests. Consider adding tests that simulate a dispatch failure and verify it gets retried after the configured interval and removed from pending_dispatches.
self._pending_dispatches: dict[int, Dispatch] = {}

@Marenz Marenz force-pushed the fix_forever_retrying branch from 597c18f to d973fb6 Compare June 5, 2025 12:55
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that you retry dispatch that just started?
Is that correct approach?

Comment on lines 278 to 280
keys = list(self._pending_dispatches.keys())
for identity in keys:
dispatch = self._pending_dispatches[identity]
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove dispatch from self._pending_dispatches here, not in self._handle_dispatch.

  1. It will be easier to read and more straightforward
  2. Removing dispatch from _pending_dispatches, at the beginning of method called handle_dispatch is confusing :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also remove it here, but it is part of the design that handle dispatch removes it because if an updated dispatch instruction comes in, we need to erase the old pending one to not try to repeat with old outdated dispatches

@Marenz
Copy link
Contributor Author

Marenz commented Jun 5, 2025

Is it possible that you retry dispatch that just started?

Yes of course, I want to only retry dispatches that started.
Dispatch.started does not mean the actor is also started, only that the dispatch wants it to be started.

self._actors: dict[int, ActorDispatcher.ActorAndChannel] = {}

self._retrier = ActorDispatcher.FailedDispatchesRetrier(retry_interval)
self._pending_dispatches: dict[int, Dispatch] = {}
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oww I god it!
Than Maybe rename it to _failed_dispatches? pending feels like they just started and didn't finished, yet.

@Marenz Marenz force-pushed the fix_forever_retrying branch from d973fb6 to 28211d7 Compare June 5, 2025 13:46
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also LGTM.

```
"""

class FailedDispatchesRetrier(BackgroundService):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So satisfying to see this class go away... 🤣

Args:
dispatch: The dispatch to handle.
"""
identity = self._dispatch_identity(dispatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to put a comment here about what you replied to Ela (or in the docstring), I agree at first sight it might look not super obvious wht removing dispatches from the pending list here.

@Marenz Marenz force-pushed the fix_forever_retrying branch 2 times, most recently from 31805be to 1532d1c Compare June 11, 2025 09:08
@Marenz Marenz force-pushed the fix_forever_retrying branch from 1532d1c to 4894cfc Compare June 11, 2025 09:10
@Marenz Marenz merged commit 76da703 into frequenz-floss:v0.x.x Jun 11, 2025
5 checks passed
@Marenz Marenz deleted the fix_forever_retrying branch June 11, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:dispatcher Affects the high-level dispatcher interface part:docs Affects the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants