Skip to content

Conversation

@ela-kotulska-frequenz
Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz commented Feb 24, 2025

Cancelling BackgroundService should not raise exception. But we see exception when we stop actor, after calling actor.wait().

@ela-kotulska-frequenz ela-kotulska-frequenz added the type:tech-debt Improves the project without visible changes for users label Feb 24, 2025
@ela-kotulska-frequenz ela-kotulska-frequenz self-assigned this Feb 24, 2025
@ela-kotulska-frequenz ela-kotulska-frequenz requested a review from a team as a code owner February 24, 2025 09:37
@ela-kotulska-frequenz ela-kotulska-frequenz requested review from a team, daniel-zullo-frequenz, llucax and shsms and removed request for a team February 24, 2025 09:37
@github-actions github-actions bot added the part:actor Affects an actor ot the actors utilities (decorator, etc.) label Feb 24, 2025
@ela-kotulska-frequenz ela-kotulska-frequenz marked this pull request as draft February 24, 2025 10:16
@github-actions github-actions bot added part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests labels Feb 24, 2025
@ela-kotulska-frequenz ela-kotulska-frequenz marked this pull request as ready for review February 24, 2025 10:26
@ela-kotulska-frequenz ela-kotulska-frequenz changed the title Stop raising exception on BackgroundService cancell Stop raising exception on BackgroundService cancel Feb 24, 2025
@shsms
Copy link
Contributor

shsms commented Feb 24, 2025

Looks like the release notes were not cleared earlier after the previous release. Could you base on top of this: #1167

@ela-kotulska-frequenz
Copy link
Contributor Author

Done, thanks!

Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wait() is working as it should. wait() is used to implement __await__ and __await__ doesn't swallow CancelledErrors in a Task, so I think we should be consistent with that.

In BackgroundService there is stop() though, which is basically:

self.cancel()
try:
    await self.wait()
except CancelledError:
    pass

So there is already a convenience method to stop a background service without caring about cancellation. With wait() we can't assume the user doesn't want to be notified about CancelledError, as they might be awaiting on the service without actually ever calling cancel(), so getting a CancelledError might be an actual unexpected situation/error.

@ela-kotulska-frequenz ela-kotulska-frequenz changed the title Stop raising exception on BackgroundService cancel Stop raising CancelledError when actor is cancelled Feb 25, 2025
@ela-kotulska-frequenz
Copy link
Contributor Author

ela-kotulska-frequenz commented Feb 25, 2025

Right! Thanks for checking.
It was problem with the method that waited for actor. I just made a fix :)



def _was_cancelled(task: asyncio.Task[Any]) -> bool:
if task.cancelled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting, so a task that stops because a cancellation error was raised in its code is not considered cancelled? So I guess cancelled will only be true if we call .cancel() to the internal task created only to await the actor finalization. If this is the case, task.cancelled() should never be true, as these tasks are internal to this function and we never call .cancel(). Maybe it is worth adding a note about this.


[...after doing some experiments...]

No, that doesn't seem to be the case:

import asyncio

async def cancelled_task() -> None:
    await asyncio.sleep(10)

async def wrapper_task() -> None:
    task = asyncio.create_task(cancelled_task())
    await asyncio.sleep(1)
    task.cancel()
    await task

async def main() -> None:
    pending = {asyncio.create_task(wrapper_task())}
    while pending:
        done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
        for task in done:
            if task.cancelled():
                print("Wrapper task was cancelled")
            elif exception := task.exception():
                match exception:
                    case asyncio.CancelledError:
                        print("Sub-task was cancelled")
                    case _:
                        print(f"Wrapper task raised an exception: {exception}")

asyncio.run(main())

This prints:

Wrapper task was cancelled

So the issue is the BaseExceptionGroup (or exception groups in general). When a CancelledError is wrapped in a group, then this exception is not properly translated to a wrapper task cancellation, and instead the task re-raises the whole group.

import asyncio

async def cancelled_task() -> None:
    try:
        await asyncio.sleep(10)
    except asyncio.CancelledError as exc:
        raise BaseExceptionGroup("Sub-task was cancelled", [exc])

async def wrapper_task() -> None:
    task = asyncio.create_task(cancelled_task())
    await asyncio.sleep(1)
    task.cancel()
    await task

async def main() -> None:
    pending = {asyncio.create_task(wrapper_task())}
    while pending:
        done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
        for task in done:
            if task.cancelled():
                print("Wrapper task was cancelled")
            elif exception := task.exception():
                match exception:
                    case asyncio.CancelledError:
                        print("Sub-task was cancelled")
                    case _:
                        print(f"Wrapper task raised an exception: {exception!r}")

asyncio.run(main())

This prints:

Wrapper task raised an exception: BaseExceptionGroup('Sub-task was cancelled', [CancelledError()])                                                                                                                                                                                                                         

So to handle this properly, we need to use except* clauses, so we can split of the exception we want to catch as subgroups.

import asyncio
from collections.abc import Coroutine

async def cancelled_task() -> None:
    await asyncio.sleep(10)

async def cancelled_tasks() -> None:
    try:
        await asyncio.sleep(10)
    except asyncio.CancelledError as exc:
        raise BaseExceptionGroup(
            "Sub-tasks have errors",
            [
                exc,
                Exception("Some other exception"),
                asyncio.CancelledError("another CancelledError"),
            ],
        )

async def wrapper_task(coro: Coroutine[None, None, None]) -> None:
    task = asyncio.create_task(coro)
    await asyncio.sleep(1)
    task.cancel()
    await task

async def main() -> None:
    pending = {
        asyncio.create_task(wrapper_task(cancelled_task()), name="cancelled_task"),
        asyncio.create_task(wrapper_task(cancelled_tasks()), name="cancelled_tasks"),
    }
    while pending:
        done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED)
        for task in done:
            if task.cancelled():
                print(f"{task.get_name()} was cancelled")
            else:
                try:
                    await task
                except* asyncio.CancelledError as exc:
                    print(f"{task.get_name()} has some cancellations: {exc!r}")
                except* Exception as exc:
                    print(
                        f"Wrapper task for {task.get_name()} raised an exception: {exc!r}"
                    )

asyncio.run(main())

This prints:

cancelled_task was cancelled                                                                                                                                                                                                                                                                                               
cancelled_tasks has some cancellations: BaseExceptionGroup('Sub-tasks have errors', [CancelledError(), CancelledError('another CancelledError')])                                                                                                                                                                          
Wrapper task for cancelled_tasks raised an exception: ExceptionGroup('Sub-tasks have errors', [Exception('Some other exception')])                                                                                                                                                                                         

@ela-kotulska-frequenz
Copy link
Contributor Author

Wow... this was tricky. I didn't know about except*...

Comment on lines 53 to 56
_logger.error(
"Actor %s: Raised an exception while running.",
task.get_name(),
exc_info=exception,
exc_info=err,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could change this with _logger.exception() now, as it is inside a try/except.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

except* asyncio.CancelledError:
_logger.info("Actor %s: Cancelled while running.", task.get_name())
elif exception := task.exception():
except* BaseException as err: # pylint: disable=broad-exception-caught
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a change of behaviour, but I think we should only catch Exception here, we probably don't want to catch stuff like KeyboardInterrupt, GeneratorExit or SystemExit. CancelledError is probably the only BaseException we ever want to catch.

Copy link
Contributor Author

@ela-kotulska-frequenz ela-kotulska-frequenz Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so,too. But we had dedicated tests for BaseException.
Removed in last commit.

@ela-kotulska-frequenz ela-kotulska-frequenz force-pushed the cancel_background_service_2 branch 3 times, most recently from 5d8ec3d to d02067a Compare February 26, 2025 10:45
BacgroundService raises BaseExceptionGroup.
We can check if task was cancelled by checking if
there are CancelledError in the list of exception.

Signed-off-by: Elzbieta Kotulska <[email protected]>
CancellError is the only BaseException that needs to be catched.

Signed-off-by: Elzbieta Kotulska <[email protected]>
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, damn, so BaseException was not actually handled, just logged and bubbled up. That is fine too, but also not that important, as a base exception, if it is not only used for flow control (like exiting from the program), will probably be printed with a backtrace, so we should know it came from an actor.

So I think both approaches are OK (logging or not logging BaseExceptions), I'm approving and leaving it up to you.

Thanks! This was a very tricky bug.

@ela-kotulska-frequenz ela-kotulska-frequenz added this pull request to the merge queue Feb 27, 2025
@llucax llucax removed this pull request from the merge queue due to a manual request Feb 27, 2025
@llucax
Copy link
Contributor

llucax commented Feb 27, 2025

(disabled auto-merge in case you want to keep logging base exceptions, but after disabling it, I think the current approach is more correct, logging them will probably just be noise anyway)

@ela-kotulska-frequenz ela-kotulska-frequenz added this pull request to the merge queue Feb 27, 2025
Merged via the queue into frequenz-floss:v1.x.x with commit bcfbd5b Feb 27, 2025
5 checks passed
@ela-kotulska-frequenz ela-kotulska-frequenz deleted the cancel_background_service_2 branch February 27, 2025 11:31
@ela-kotulska-frequenz
Copy link
Contributor Author

ela-kotulska-frequenz commented Feb 27, 2025

No no, If I understand correctly:
Previously we handled BaseException , that means if there is code after await run(*actors), then it was executed.
Now we raise BaseException. That means if there is code after await run(*actors) then it will not be executed.

And I think new approach is correct, because we stop as soon as something raise BaseException.

@llucax llucax added this to the v1.0.0-rc1700 milestone Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:actor Affects an actor ot the actors utilities (decorator, etc.) part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests type:tech-debt Improves the project without visible changes for users

Projects

Development

Successfully merging this pull request may close these issues.

3 participants