Skip to content

Conversation

@florian-wagner-frequenz
Copy link
Contributor

Previously there was a test which violated type-hints. As we can not keep having the test and type hints on that test, the test itself was rewritten to test a semantically slightly different but similar edge case.
The original test case is of course prevented when using type-hints (i.e not passing None to a function that doesn't accept None)

@Copilot Copilot AI review requested due to automatic review settings June 17, 2025 11:30
@florian-wagner-frequenz florian-wagner-frequenz requested a review from a team as a code owner June 17, 2025 11:30
@florian-wagner-frequenz florian-wagner-frequenz requested review from Marenz and removed request for a team June 17, 2025 11:30
@github-actions github-actions bot added the part:tests Affects the unit, integration and performance (benchmarks) tests label Jun 17, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates a test to avoid passing None and instead validates behavior with an empty config dict, aligning with current type hints.

  • Renames test from test_load_config_load_None to test_load_config_load_empty
  • Updates the docstring to describe the empty-config edge case
  • Changes the load_config call to pass the entire config dict instead of config.get("loggers", None)
Comments suppressed due to low confidence (1)

tests/config/test_util.py:57

  • [nitpick] The test name test_load_config_load_empty repeats 'load'. Consider renaming it to test_load_config_empty for clarity.
def test_load_config_load_empty(

@florian-wagner-frequenz
Copy link
Contributor Author

Fixes the underlying problem preventing #1234 from proceeding

@github-actions github-actions bot added the part:docs Affects the documentation label Jun 17, 2025
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested improvements to the tests are completely optional, not approving only because I think we should not add stuff that users don't care about in the release notes.

@florian-wagner-frequenz florian-wagner-frequenz added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Jun 18, 2025
@florian-wagner-frequenz florian-wagner-frequenz force-pushed the fix_test_types branch 2 times, most recently from 3e55f21 to e28a775 Compare June 18, 2025 08:53
Marenz
Marenz previously approved these changes Jun 18, 2025
@florian-wagner-frequenz florian-wagner-frequenz added this pull request to the merge queue Jun 18, 2025
@github-project-automation github-project-automation bot moved this from To do to Review approved in Python SDK Roadmap Jun 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jun 18, 2025
llucax
llucax previously approved these changes Jun 18, 2025
@Marenz
Copy link
Contributor

Marenz commented Jun 18, 2025

removed this pull request from the merge queue due to no response for status checks 1 hour ago

hmm maybe my fix 45c4ad7
wasn't a fix?

@florian-wagner-frequenz florian-wagner-frequenz added this pull request to the merge queue Jun 18, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jun 18, 2025
Previously there was a test which violated type-hints. As we can not
keep having the test _and_ type hints on that test, the test itself was
rewritten to test a semantically slightly different but similar edge
case.
The original test case is of course prevented when using type-hints (i.e
not passing `None` to a function that doesn't accept `None`)
@florian-wagner-frequenz
Copy link
Contributor Author

Trying once more in the hope that this is spurious. Otherwise I would leave it to @Marenz to figure out what is going on

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jun 18, 2025
@florian-wagner-frequenz
Copy link
Contributor Author

Yup, your problem now @Marenz

@github-actions github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Jun 19, 2025
@Marenz Marenz removed this pull request from the merge queue due to a manual request Jun 19, 2025
@Marenz Marenz force-pushed the fix_test_types branch 2 times, most recently from 3242588 to 46026b2 Compare June 19, 2025 15:35
@github-actions github-actions bot added the part:microgrid Affects the interactions with the microgrid label Jun 19, 2025
@github-actions github-actions bot added the part:actor Affects an actor ot the actors utilities (decorator, etc.) label Jun 19, 2025
@Marenz Marenz requested review from llucax and shsms June 19, 2025 16:09
@Marenz Marenz enabled auto-merge June 23, 2025 10:44
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that we have a more fundamental problem if this fixes the infinite loop issue. Both the actor and the battery status tracker loops should exit when there is a CancelledError, so it looks like we are forgetting to cancel at some point, or worse, someone is eating up the CancelledError in the middle.

This hack will uncover this more fundamental issue, which could come to bite us in other ways, so I would try to find the real root cause for this instead of patching it just to pass the tests.

@llucax
Copy link
Contributor

llucax commented Jun 23, 2025

Also probably not the best idea to hijack this PR to fix the CI, why didn't you create a new PR like last time?

@Marenz
Copy link
Contributor

Marenz commented Jun 23, 2025

Also probably not the best idea to hijack this PR to fix the CI, why didn't you create a new PR like last time?

Because this was easier to test, and well, it's working now.. it wasn't reliably reproducable with the other PR which is how we ended up in this situation in the first place ;)

so I would try to find the real root cause for this

This is fixing the real root cause.
I wrote it on slack last week, but here is the analysis:

I think I got it. So the run loop in _batter_status_tracker will never quit except for a cancelled exception. On shutdown, it will get the eventloop closed error and will try again..and again..and again.. and because the eventloop is closed, it's never giving up control to let other tasks properly clean up, including the task that would trigger the cancelexception for it (componentpool_status_tracker.py the _run method with the async exit stack context lib thing).

I didn't write it explicitly, but the exact same thing is also happening in the actor-restart part. The actor stops because of the RuntimeError: Eventloop is closed error makes it try to restart before any chancel exception can be done.

@github-actions github-actions bot added the part:core Affects the SDK core components (data structures, etc.) label Jun 23, 2025
@llucax
Copy link
Contributor

llucax commented Jun 23, 2025

Because this was easier to test, and well, it's working now.. it wasn't reliably reproducable with the other PR which is how we ended up in this situation in the first place ;)

You could have created another PR with the same commits as this PR for testing. Now is like the original PR got completely lost in the noise, and we are also spamming @florian-wagner-frequenz unnecessarily :P

Also discussing via slack, but for the records, what I mean about root cause is that these loops should finish, before the event loop is closed, via a CancelledError. If this is not happening it means that something of these is happening:

  • We are missing a cancel()
  • We are missing an await on the cancelled task
  • We are doing the above, but somehow after the loop was closed (in __del__ for example).

We should find and fix when the loop is being closed without every task being properly stopped and awaited, instead of just hiding when it happens pretending that nothing is wrong... 😱

@github-actions github-actions bot added the part:data-pipeline Affects the data pipeline label Jun 23, 2025
@Marenz
Copy link
Contributor

Marenz commented Jun 23, 2025

and we are also spamming florian-wagner-frequenz unnecessarily :P

Nah, he announced that he unsubscribed and gave over ownership to me :) (that is, until you highlighted him again now :P )
His part is reviewed and approved, nothing to get lost.

But yes, I suppose for future historians this is a less optimal conflation of things that get fixed.

@Marenz Marenz requested review from llucax and shsms June 23, 2025 16:58
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in general, but:

  1. After you verified it works, please open a new PR with these fixes, these fixes are too important to be sneaked in a PR that was trying to fix something much more trivial.
  2. It would be good to verify that this works even without handling the case where the loop is gone in run_forever() and Actor._run_loop(), but I guess if we make them crash instead of continuing as if nothing happened, that also works, as pytest will fail/error the tests that crashed.

Comment on lines 58 to 57
except Exception: # pylint: disable=broad-except
if not asyncio.get_event_loop().is_running():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually wrong, get_event_loop() is deprecated in Python 3.12, get_running_loop() is recommended but then this function will raise if there is no running event loop. Apparently there is no reliable way to query if an event loop is running in the current context that is not calling get_running_loop() and catch the RuntimeError, so we should probably do that here too, or unify it with except RuntimeError, as catching `RuntimeError in particular doesn't help in any way if we are not inspecting the message (which I agree is less reliable).

All this said, and since this is just a safety net to avoid infinite loop in the presence of a HUGE bug (async code should never run outside of the context of an event loop), instead of returning I would try to crash the application with a backtrace. Maybe log the exception and then just simply assert False, or even sys.exit(-1) to ensure this assert is not simply swallowed by some dead task that didn't propagated the exception because it wasn't awaited. For me in this case crashing the complete thing makes sense because once there is no more event loop, we can't really do any work, it doesn't make sense to try to continue after knowing there is no loop anymore.

I think this could help us figuring out exactly which part of the code is entering an infinite loop.

" not trying to restart %s again.",
self,
)
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, assert False instead of break.

@Marenz Marenz disabled auto-merge June 24, 2025 10:23
@Marenz Marenz merged commit d118f42 into frequenz-floss:v1.x.x Jun 24, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from Review approved to Done in Python SDK Roadmap Jun 24, 2025
@llucax llucax added this to the v1.0.0-rc2100 milestone Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmd:skip-release-notes It is not necessary to update release notes for this PR part:actor Affects an actor ot the actors utilities (decorator, etc.) part:core Affects the SDK core components (data structures, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:microgrid Affects the interactions with the microgrid part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)

Projects

Development

Successfully merging this pull request may close these issues.

4 participants