Skip to content

Conversation

@llucax
Copy link
Contributor

@llucax llucax commented May 16, 2025

No description provided.

@github-actions github-actions bot added part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) labels May 16, 2025
@llucax
Copy link
Contributor Author

llucax commented May 16, 2025

Sadly the new time-machine has issues with async-solipsism, as it also mocks the monotonic clock. We first need to get this sorted out before being able to merge this:

time-machine 2.12 doesn't support Python 3.13.

@llucax llucax force-pushed the py3.13 branch 2 times, most recently from 9e74107 to 18f6a68 Compare May 16, 2025 15:35
@llucax llucax marked this pull request as ready for review May 16, 2025 15:35
@Copilot Copilot AI review requested due to automatic review settings May 16, 2025 15:35
@llucax llucax requested a review from a team as a code owner May 16, 2025 15:35
@llucax llucax requested review from daniel-zullo-frequenz and removed request for a team May 16, 2025 15:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Python 3.13 across the project by updating documentation and CI configuration.

  • Updated RELEASE_NOTES.md to announce Python 3.13 support
  • Updated README.md to list Python 3.13 as a supported version
  • Updated CI workflows to include Python 3.13 in the testing matrix

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
RELEASE_NOTES.md Updated text announcing Python 3.13 support
README.md Updated supported Python versions list
.github/workflows/ci.yaml Added Python 3.13 to the CI testing configuration

@llucax
Copy link
Contributor Author

llucax commented May 16, 2025

Ready for review, thanks @shsms for #1221! 🎉

@llucax llucax self-assigned this May 16, 2025
@llucax llucax enabled auto-merge May 16, 2025 15:39
@llucax llucax requested a review from shsms May 16, 2025 15:39
@llucax llucax added the type:enhancement New feature or enhancement visitble to users label May 16, 2025
@llucax llucax moved this from To do to Review in progress in Python SDK Roadmap May 16, 2025
@llucax llucax added this to the v1.0.0-rc2000 milestone May 16, 2025
shsms
shsms previously approved these changes May 17, 2025
@llucax llucax added this pull request to the merge queue May 17, 2025
@github-project-automation github-project-automation bot moved this from Review in progress to Review approved in Python SDK Roadmap May 17, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks May 17, 2025
@llucax
Copy link
Contributor Author

llucax commented May 19, 2025

@shsms the CI says no (in the queue)

@llucax
Copy link
Contributor Author

llucax commented May 19, 2025

OK, this is a new one... It is failing for all 3.13 runs except arm, ci_checks_max (this one passes), and not really failing:

2025-05-17T06:01:06.3319377Z ================== 424 passed, 9 warnings in 93.42s (0:01:33) ==================

And it seems that after the tests are done, it gets stuck in some infinite loop. I'm attaching an extract of the logs because the logs are so big, it would be easier for other people to look at the extract.

The repeated log is:

2025-05-17T06:01:49.5822344Z Actor PowerManagingActor[140302124270608]: Raised an unhandled exception.
2025-05-17T06:01:49.5823182Z Traceback (most recent call last):
2025-05-17T06:01:49.5824189Z   File "/home/runner/work/frequenz-sdk-python/frequenz-sdk-python/src/frequenz/sdk/actor/_actor.py", line 90, in _run_loop
2025-05-17T06:01:49.5825214Z     await self._delay_if_restart(n_restarts)
2025-05-17T06:01:49.5826325Z   File "/home/runner/work/frequenz-sdk-python/frequenz-sdk-python/src/frequenz/sdk/actor/_actor.py", line 73, in _delay_if_restart
2025-05-17T06:01:49.5827531Z     await asyncio.sleep(delay)
2025-05-17T06:01:49.5828384Z   File "/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/asyncio/tasks.py", line 712, in sleep
2025-05-17T06:01:49.5829201Z     loop = events.get_running_loop()
2025-05-17T06:01:49.5829648Z RuntimeError: no running event loop

I wonder if maybe we are trying to do some cleanup in some __del__ and Python3.13 retries it if it fails or something, but that doesn't explain why it works with one Python3.13. Since this involved the power manager, and you are very keen on getting Python3.13, maybe @shsms you can have a look? Otherwise I will when I have some time.

@shsms
Copy link
Contributor

shsms commented May 19, 2025

Interesting, I'm able to reproduce it locally, but only with pytest_max. I'll see if I can figure it out.

@shsms
Copy link
Contributor

shsms commented May 19, 2025

I wonder if this is a new python bug. It is running the async Actor._run_loop method infinitely. If the event loop is closed, that function should stop running as well, but it keeps running and tries to call await sleep and that's failing with RuntimeError: no running event loop, but the _run_loop function is not going away.

Longterm solution is to implement reference counting for shared actor lifetimes. But for now, we could do one of:

  1. don't restart actor if no loop (this works):
diff --git a/src/frequenz/sdk/actor/_actor.py b/src/frequenz/sdk/actor/_actor.py
index e62c2b63..26b994ea 100644
--- a/src/frequenz/sdk/actor/_actor.py
+++ b/src/frequenz/sdk/actor/_actor.py
@@ -94,6 +94,9 @@ class Actor(BackgroundService, abc.ABC):
                 _logger.info("Actor %s: Cancelled.", self)
                 raise
             except Exception:  # pylint: disable=broad-except
+                if not asyncio.get_running_loop().is_running():
+                    _logger.info("Actor %s: Loop is closed, stopping...", self)
+                    raise
                 _logger.exception("Actor %s: Raised an unhandled exception.", self)
                 limit_str = "∞" if self._restart_limit is None else self._restart_limit
                 limit_str = f"({n_restarts}/{limit_str})"
  1. Catch RuntimeError in actor.py, check description and raise if no loop. Haven't tried this, but should work.

@llucax
Copy link
Contributor Author

llucax commented May 19, 2025

If the dependency versions affect the results, I guess it is not purely a Python bug, or is a bug that only manifests in some dependency. Have you tried to pin-point which dependency upgrade triggers the bug? It might be useful to understand better what's going on and/or to pick one of the solutions.

My bets go to pytest or pytest-async (maybe solipsism if it is involved too), as it looks like our loop is closed, but there might be a second one still running and executing this code, otherwise I can't see how some async code could be executed after the loop is closed.

@llucax
Copy link
Contributor Author

llucax commented May 19, 2025

In particular if this is a regression on some test library, it might be more likely that it gets fixed soon.

@shsms
Copy link
Contributor

shsms commented May 19, 2025

Have you tried to pin-point which dependency upgrade triggers the bug?

No

My bets go to pytest or pytest-async (maybe solipsism if it is involved too),

Those are all pinned to a single version, under dev-pytest, can't imagine those don't explain the discrepancy between pytest_min/_max.

as it looks like our loop is closed, but there might be a second one still running and executing this code,

There is a function Actor._run_loop running in some event loop. That function is calling await sleep, and that is going to get created in the same loop as _run_loop, and that's failing. I don't think that's because of two different loops.

otherwise I can't see how some async code could be executed after the loop is closed.

I can 🤯 they have a 🐞

@shsms
Copy link
Contributor

shsms commented May 20, 2025

Actually you might be right, I'm thinking it must be a 🪲 in solipsism's event loop, because it is failing only with the TestEVChargerPoolControl tests as far as I can see and that uses async_solipsism, and the retries happen immediately after each other, without the 2-sec wait period that a regular actor would take.

But we use the same version of async-solipsism for min and max tests 🤔

I'll try some different versions.

@llucax
Copy link
Contributor Author

llucax commented May 20, 2025

Maybe it is an interaction between solipsism and py3.13, maybe solipsism is assuming some python internals that changed...

@shsms
Copy link
Contributor

shsms commented May 20, 2025

Actually it was just some missing cleanup. Fixed here: #1222

@llucax
Copy link
Contributor Author

llucax commented May 20, 2025

Awesome, but do you know how that explain async code being running outside the loop?

@shsms
Copy link
Contributor

shsms commented May 20, 2025

Awesome, but do you know how that explain async code being running outside the loop?

No, and I don't think it was solipsism. I disabled the solipsism event loop and it was still getting stuck. Maybe something to do with the MockMicrogrid, don't know. But I still feel it is Python.

@llucax
Copy link
Contributor Author

llucax commented May 20, 2025

You got the Python's vibe... 😆

OK, let's hope it doesn't come back to bite us...

Move on, nothing to see here

Signed-off-by: Leandro Lucarella <[email protected]>
@llucax
Copy link
Contributor Author

llucax commented May 20, 2025

Rebased, we can try again with the queue, but it needs a new approval @shsms

@llucax llucax enabled auto-merge May 20, 2025 15:00
@llucax llucax requested a review from shsms May 20, 2025 15:01
@llucax llucax added this pull request to the merge queue May 20, 2025
Merged via the queue into frequenz-floss:v1.x.x with commit 15bcff5 May 20, 2025
5 checks passed
@llucax llucax deleted the py3.13 branch May 20, 2025 15:14
@github-project-automation github-project-automation bot moved this from Review approved to Done in Python SDK Roadmap May 20, 2025
@llucax
Copy link
Contributor Author

llucax commented May 21, 2025

Awesome, thanks @shsms !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:docs Affects the documentation part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) type:enhancement New feature or enhancement visitble to users

Projects

Development

Successfully merging this pull request may close these issues.

2 participants