Fix test typing #1235

florian-wagner-frequenz · 2025-06-17T11:30:06Z

Previously there was a test which violated type-hints. As we can not keep having the test and type hints on that test, the test itself was rewritten to test a semantically slightly different but similar edge case.
The original test case is of course prevented when using type-hints (i.e not passing None to a function that doesn't accept None)

Copilot

Pull Request Overview

This PR updates a test to avoid passing None and instead validates behavior with an empty config dict, aligning with current type hints.

Renames test from test_load_config_load_None to test_load_config_load_empty
Updates the docstring to describe the empty-config edge case
Changes the load_config call to pass the entire config dict instead of config.get("loggers", None)

Comments suppressed due to low confidence (1)

tests/config/test_util.py:57

[nitpick] The test name test_load_config_load_empty repeats 'load'. Consider renaming it to test_load_config_empty for clarity.

def test_load_config_load_empty(

florian-wagner-frequenz · 2025-06-17T11:30:44Z

Fixes the underlying problem preventing #1234 from proceeding

llucax

The suggested improvements to the tests are completely optional, not approving only because I think we should not add stuff that users don't care about in the release notes.

tests/config/test_util.py

RELEASE_NOTES.md

Marenz · 2025-06-18T11:41:03Z

removed this pull request from the merge queue due to no response for status checks 1 hour ago

hmm maybe my fix 45c4ad7
wasn't a fix?

Previously there was a test which violated type-hints. As we can not keep having the test _and_ type hints on that test, the test itself was rewritten to test a semantically slightly different but similar edge case. The original test case is of course prevented when using type-hints (i.e not passing `None` to a function that doesn't accept `None`)

florian-wagner-frequenz · 2025-06-18T14:17:03Z

Trying once more in the hope that this is spurious. Otherwise I would leave it to @Marenz to figure out what is going on

florian-wagner-frequenz · 2025-06-18T16:07:30Z

Yup, your problem now @Marenz

src/frequenz/sdk/actor/_actor.py

llucax

It looks to me that we have a more fundamental problem if this fixes the infinite loop issue. Both the actor and the battery status tracker loops should exit when there is a CancelledError, so it looks like we are forgetting to cancel at some point, or worse, someone is eating up the CancelledError in the middle.

This hack will uncover this more fundamental issue, which could come to bite us in other ways, so I would try to find the real root cause for this instead of patching it just to pass the tests.

llucax · 2025-06-23T11:33:08Z

Also probably not the best idea to hijack this PR to fix the CI, why didn't you create a new PR like last time?

Marenz · 2025-06-23T11:53:15Z

Also probably not the best idea to hijack this PR to fix the CI, why didn't you create a new PR like last time?

Because this was easier to test, and well, it's working now.. it wasn't reliably reproducable with the other PR which is how we ended up in this situation in the first place ;)

so I would try to find the real root cause for this

This is fixing the real root cause.
I wrote it on slack last week, but here is the analysis:

I think I got it. So the run loop in _batter_status_tracker will never quit except for a cancelled exception. On shutdown, it will get the eventloop closed error and will try again..and again..and again.. and because the eventloop is closed, it's never giving up control to let other tasks properly clean up, including the task that would trigger the cancelexception for it (componentpool_status_tracker.py the _run method with the async exit stack context lib thing).

I didn't write it explicitly, but the exact same thing is also happening in the actor-restart part. The actor stops because of the RuntimeError: Eventloop is closed error makes it try to restart before any chancel exception can be done.

llucax · 2025-06-23T12:24:58Z

Because this was easier to test, and well, it's working now.. it wasn't reliably reproducable with the other PR which is how we ended up in this situation in the first place ;)

You could have created another PR with the same commits as this PR for testing. Now is like the original PR got completely lost in the noise, and we are also spamming @florian-wagner-frequenz unnecessarily :P

Also discussing via slack, but for the records, what I mean about root cause is that these loops should finish, before the event loop is closed, via a CancelledError. If this is not happening it means that something of these is happening:

We are missing a cancel()
We are missing an await on the cancelled task
We are doing the above, but somehow after the loop was closed (in __del__ for example).

We should find and fix when the loop is being closed without every task being properly stopped and awaited, instead of just hiding when it happens pretending that nothing is wrong... 😱

Marenz · 2025-06-23T14:29:47Z

and we are also spamming florian-wagner-frequenz unnecessarily :P

Nah, he announced that he unsubscribed and gave over ownership to me :) (that is, until you highlighted him again now :P )
His part is reviewed and approved, nothing to get lost.

But yes, I suppose for future historians this is a less optimal conflation of things that get fixed.

llucax

Looks good to me in general, but:

After you verified it works, please open a new PR with these fixes, these fixes are too important to be sneaked in a PR that was trying to fix something much more trivial.
It would be good to verify that this works even without handling the case where the loop is gone in run_forever() and Actor._run_loop(), but I guess if we make them crash instead of continuing as if nothing happened, that also works, as pytest will fail/error the tests that crashed.

llucax · 2025-06-24T07:16:18Z

src/frequenz/sdk/_internal/_asyncio.py

        except Exception:  # pylint: disable=broad-except
            if not asyncio.get_event_loop().is_running():


This is actually wrong, get_event_loop() is deprecated in Python 3.12, get_running_loop() is recommended but then this function will raise if there is no running event loop. Apparently there is no reliable way to query if an event loop is running in the current context that is not calling get_running_loop() and catch the RuntimeError, so we should probably do that here too, or unify it with except RuntimeError, as catching `RuntimeError in particular doesn't help in any way if we are not inspecting the message (which I agree is less reliable).

All this said, and since this is just a safety net to avoid infinite loop in the presence of a HUGE bug (async code should never run outside of the context of an event loop), instead of returning I would try to crash the application with a backtrace. Maybe log the exception and then just simply assert False, or even sys.exit(-1) to ensure this assert is not simply swallowed by some dead task that didn't propagated the exception because it wasn't awaited. For me in this case crashing the complete thing makes sense because once there is no more event loop, we can't really do any work, it doesn't make sense to try to continue after knowing there is no loop anymore.

I think this could help us figuring out exactly which part of the code is entering an infinite loop.

llucax · 2025-06-24T07:19:21Z

src/frequenz/sdk/actor/_actor.py

+                            " not trying to restart %s again.",
+                            self,
+                        )
+                        break


Same here, assert False instead of break.

Copilot AI review requested due to automatic review settings June 17, 2025 11:30

florian-wagner-frequenz requested a review from a team as a code owner June 17, 2025 11:30

florian-wagner-frequenz requested review from Marenz and removed request for a team June 17, 2025 11:30

github-project-automation bot added this to Python SDK Roadmap Jun 17, 2025

github-project-automation bot moved this to To do in Python SDK Roadmap Jun 17, 2025

github-actions bot added the part:tests Affects the unit, integration and performance (benchmarks) tests label Jun 17, 2025

Copilot AI reviewed Jun 17, 2025

View reviewed changes

florian-wagner-frequenz requested a review from llucax June 17, 2025 11:30

florian-wagner-frequenz force-pushed the fix_test_types branch from a524089 to 2e24f61 Compare June 17, 2025 11:32

github-actions bot added the part:docs Affects the documentation label Jun 17, 2025

llucax reviewed Jun 17, 2025

View reviewed changes

tests/config/test_util.py Outdated Show resolved Hide resolved

RELEASE_NOTES.md Outdated Show resolved Hide resolved

florian-wagner-frequenz force-pushed the fix_test_types branch from 2e24f61 to abeae86 Compare June 18, 2025 08:44

florian-wagner-frequenz requested a review from llucax June 18, 2025 08:44

florian-wagner-frequenz added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Jun 18, 2025

florian-wagner-frequenz enabled auto-merge June 18, 2025 08:52

florian-wagner-frequenz force-pushed the fix_test_types branch 2 times, most recently from 3e55f21 to e28a775 Compare June 18, 2025 08:53

Marenz previously approved these changes Jun 18, 2025

View reviewed changes

florian-wagner-frequenz added this pull request to the merge queue Jun 18, 2025

github-project-automation bot moved this from To do to Review approved in Python SDK Roadmap Jun 18, 2025

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jun 18, 2025

llucax previously approved these changes Jun 18, 2025

View reviewed changes

florian-wagner-frequenz added this pull request to the merge queue Jun 18, 2025

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jun 18, 2025

Marenz dismissed llucax’s stale review via 3b45bbd June 19, 2025 14:06

github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Jun 19, 2025

Marenz removed this pull request from the merge queue due to a manual request Jun 19, 2025

Marenz force-pushed the fix_test_types branch 2 times, most recently from 3242588 to 46026b2 Compare June 19, 2025 15:35

github-actions bot added the part:microgrid Affects the interactions with the microgrid label Jun 19, 2025

Marenz force-pushed the fix_test_types branch from 46026b2 to 5f92868 Compare June 19, 2025 16:00

github-actions bot added the part:actor Affects an actor ot the actors utilities (decorator, etc.) label Jun 19, 2025

Marenz force-pushed the fix_test_types branch from 5f92868 to 82e131b Compare June 19, 2025 16:09

Marenz requested review from llucax and shsms June 19, 2025 16:09

shsms reviewed Jun 19, 2025

View reviewed changes

src/frequenz/sdk/actor/_actor.py Outdated Show resolved Hide resolved

Marenz enabled auto-merge June 23, 2025 10:44

llucax reviewed Jun 23, 2025

View reviewed changes

github-actions bot added the part:core Affects the SDK core components (data structures, etc.) label Jun 23, 2025

github-actions bot added the part:data-pipeline Affects the data pipeline label Jun 23, 2025

Marenz force-pushed the fix_test_types branch from b77d29f to 058a0f1 Compare June 23, 2025 14:27

Marenz requested review from llucax and shsms June 23, 2025 16:58

llucax reviewed Jun 24, 2025

View reviewed changes

Marenz force-pushed the fix_test_types branch from 058a0f1 to 9665f64 Compare June 24, 2025 10:13

Marenz approved these changes Jun 24, 2025

View reviewed changes

Marenz disabled auto-merge June 24, 2025 10:23

Marenz merged commit d118f42 into frequenz-floss:v1.x.x Jun 24, 2025
9 checks passed

github-project-automation bot moved this from Review approved to Done in Python SDK Roadmap Jun 24, 2025

llucax added this to the v1.0.0-rc2100 milestone Jun 27, 2025

		except Exception: # pylint: disable=broad-except
		if not asyncio.get_event_loop().is_running():

Fix test typing #1235

Fix test typing #1235

Uh oh!

Conversation

florian-wagner-frequenz commented Jun 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

florian-wagner-frequenz commented Jun 17, 2025

Uh oh!

llucax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marenz commented Jun 18, 2025

Uh oh!

florian-wagner-frequenz commented Jun 18, 2025

Uh oh!

Uh oh!

florian-wagner-frequenz commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

llucax left a comment

Choose a reason for hiding this comment

Uh oh!

llucax commented Jun 23, 2025

Uh oh!

Marenz commented Jun 23, 2025

Uh oh!

llucax commented Jun 23, 2025

Uh oh!

Marenz commented Jun 23, 2025

Uh oh!

llucax left a comment

Choose a reason for hiding this comment

Uh oh!

llucax Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

llucax Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants