-
Notifications
You must be signed in to change notification settings - Fork 20
Fix test typing #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test typing #1235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates a test to avoid passing None and instead validates behavior with an empty config dict, aligning with current type hints.
- Renames test from
test_load_config_load_Nonetotest_load_config_load_empty - Updates the docstring to describe the empty-config edge case
- Changes the
load_configcall to pass the entireconfigdict instead ofconfig.get("loggers", None)
Comments suppressed due to low confidence (1)
tests/config/test_util.py:57
- [nitpick] The test name
test_load_config_load_emptyrepeats 'load'. Consider renaming it totest_load_config_emptyfor clarity.
def test_load_config_load_empty(
|
Fixes the underlying problem preventing #1234 from proceeding |
a524089 to
2e24f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested improvements to the tests are completely optional, not approving only because I think we should not add stuff that users don't care about in the release notes.
2e24f61 to
abeae86
Compare
3e55f21 to
e28a775
Compare
hmm maybe my fix 45c4ad7 |
Previously there was a test which violated type-hints. As we can not keep having the test _and_ type hints on that test, the test itself was rewritten to test a semantically slightly different but similar edge case. The original test case is of course prevented when using type-hints (i.e not passing `None` to a function that doesn't accept `None`)
|
Trying once more in the hope that this is spurious. Otherwise I would leave it to @Marenz to figure out what is going on |
|
Yup, your problem now @Marenz |
3242588 to
46026b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me that we have a more fundamental problem if this fixes the infinite loop issue. Both the actor and the battery status tracker loops should exit when there is a CancelledError, so it looks like we are forgetting to cancel at some point, or worse, someone is eating up the CancelledError in the middle.
This hack will uncover this more fundamental issue, which could come to bite us in other ways, so I would try to find the real root cause for this instead of patching it just to pass the tests.
|
Also probably not the best idea to hijack this PR to fix the CI, why didn't you create a new PR like last time? |
Because this was easier to test, and well, it's working now.. it wasn't reliably reproducable with the other PR which is how we ended up in this situation in the first place ;)
This is fixing the real root cause.
I didn't write it explicitly, but the exact same thing is also happening in the actor-restart part. The actor stops because of the RuntimeError: Eventloop is closed error makes it try to restart before any chancel exception can be done. |
You could have created another PR with the same commits as this PR for testing. Now is like the original PR got completely lost in the noise, and we are also spamming @florian-wagner-frequenz unnecessarily :P Also discussing via slack, but for the records, what I mean about root cause is that these loops should finish, before the event loop is closed, via a
We should find and fix when the loop is being closed without every task being properly stopped and awaited, instead of just hiding when it happens pretending that nothing is wrong... 😱 |
Nah, he announced that he unsubscribed and gave over ownership to me :) (that is, until you highlighted him again now :P ) But yes, I suppose for future historians this is a less optimal conflation of things that get fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general, but:
- After you verified it works, please open a new PR with these fixes, these fixes are too important to be sneaked in a PR that was trying to fix something much more trivial.
- It would be good to verify that this works even without handling the case where the loop is gone in
run_forever()andActor._run_loop(), but I guess if we make them crash instead of continuing as if nothing happened, that also works, aspytestwill fail/error the tests that crashed.
| except Exception: # pylint: disable=broad-except | ||
| if not asyncio.get_event_loop().is_running(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually wrong, get_event_loop() is deprecated in Python 3.12, get_running_loop() is recommended but then this function will raise if there is no running event loop. Apparently there is no reliable way to query if an event loop is running in the current context that is not calling get_running_loop() and catch the RuntimeError, so we should probably do that here too, or unify it with except RuntimeError, as catching `RuntimeError in particular doesn't help in any way if we are not inspecting the message (which I agree is less reliable).
All this said, and since this is just a safety net to avoid infinite loop in the presence of a HUGE bug (async code should never run outside of the context of an event loop), instead of returning I would try to crash the application with a backtrace. Maybe log the exception and then just simply assert False, or even sys.exit(-1) to ensure this assert is not simply swallowed by some dead task that didn't propagated the exception because it wasn't awaited. For me in this case crashing the complete thing makes sense because once there is no more event loop, we can't really do any work, it doesn't make sense to try to continue after knowing there is no loop anymore.
I think this could help us figuring out exactly which part of the code is entering an infinite loop.
src/frequenz/sdk/actor/_actor.py
Outdated
| " not trying to restart %s again.", | ||
| self, | ||
| ) | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, assert False instead of break.
Previously there was a test which violated type-hints. As we can not keep having the test and type hints on that test, the test itself was rewritten to test a semantically slightly different but similar edge case.
The original test case is of course prevented when using type-hints (i.e not passing
Noneto a function that doesn't acceptNone)