Skip to content

Conversation

@llucax
Copy link
Contributor

@llucax llucax commented Aug 8, 2024

asyncio.TaskGroup is a very convenient construct when using parallelization for doing calculations for example, where the results for all the tasks need to be merged together to produce a final result. In this case if one of the tasks fails, it makes sense to cancel the others and abort as soon as possible, as any further calculations would be thrown away.

This PR introduces a new PersistentTaskGroup class, intended to help managing a group of tasks that should persist even if other tasks in the group fail, usually by either only discarding the failed task or by restarting it somehow.

The ServiceBase class is updated to use a PersistentTaskGroup underneath, and it is simplified for single-task services by making the service driven by a single main task, which can use the new group to monitor sub-tasks and act accordingly.

This is part of #27 and #9.

llucax added 5 commits July 31, 2024 12:55
Even when just returning `None` should be compatible with `bool | None`,
it is better to declare the right type explicitly, as we had some issues
in the past with `mypy` if this is not the case.

Signed-off-by: Leandro Lucarella <[email protected]>
Signed-off-by: Leandro Lucarella <[email protected]>
We are using a syntax that it looks like one can (re)construct the
object by using the `repr()` but this is not true in this case, so btter
to follow the Python convention and use `<>` in this case.

Signed-off-by: Leandro Lucarella <[email protected]>
We tend to use `:` for short representations, so it is better to stick
to that and it is shorter and make nesting less verbose.

Signed-off-by: Leandro Lucarella <[email protected]>
Since more code is coming, and this module is already a bit large, we
split it into a util and a service sub-module.

Signed-off-by: Leandro Lucarella <[email protected]>
@llucax llucax requested a review from a team as a code owner August 8, 2024 14:48
@llucax llucax self-assigned this Aug 8, 2024
@llucax llucax added this to the v1.0.0 milestone Aug 8, 2024
@github-actions github-actions bot added part:tests Affects the unit, integration and performance (benchmarks) tests part:asyncio Affects the asyncio module labels Aug 8, 2024
@llucax
Copy link
Contributor Author

llucax commented Aug 8, 2024

This brings a Service much closer to the SDK Actor (basically only the auto-restart part is missing).

Next, I will create a PersistentServiceGroup so a service can have sub-services (i.e. an actor can monitor sub-actors), which hopefully can also serve as a separate auto-restart mechanism, so auto-restart is not hardcoded in the actor's run.

@llucax
Copy link
Contributor Author

llucax commented Aug 8, 2024

I hope that with this PersistentTaskGroup we can make errors managing sub-tasks (and completely missing errors in sub-tasks) much easier. @daniel-zullo-frequenz this is what I think should fit perfectly to implement frequenz-floss/frequenz-sdk-python#1023 (comment).

@llucax llucax added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Aug 8, 2024
@llucax
Copy link
Contributor Author

llucax commented Aug 8, 2024

Skipping release notes as I will write them for the first release from scratch.

@llucax llucax force-pushed the service-main-task branch from 8d83a23 to 6772da2 Compare August 8, 2024 14:57
@daniel-zullo-frequenz
Copy link

I hope that with this PersistentTaskGroup we can make errors managing sub-tasks (and completely missing errors in sub-tasks) much easier. @daniel-zullo-frequenz this is what I think should fit perfectly to implement frequenz-floss/frequenz-sdk-python#1023 (comment).

Right, I think it would help to reduce complexity in the code and make it less error-prone by keeping track of the created tasks throughout their lifespan and taking care of errors handling.
IIUC, we will still need to perform a linear search using task names to identify the running tasks in PersistentTaskGroup. Particularly in frequenz-floss/frequenz-sdk-python#1023, I initially used task names to track tasks but this turned out to be a bad idea for the long run since we will need to check if requests overlap before creating a tasks.
Therefore, I'm wondering if PersistentTaskGroup could somehow use a dictionary to enable faster lookups and track tasks by using a user-defined type instead of relying on a set of tasks and the task names.

@llucax
Copy link
Contributor Author

llucax commented Aug 9, 2024

IIUC, we will still need to perform a linear search using task names to identify the running tasks in PersistentTaskGroup. Particularly in frequenz-floss/frequenz-sdk-python#1023, I initially used task names to track tasks but this turned out to be a bad idea for the long run since we will need to check if requests overlap before creating a tasks.
Therefore, I'm wondering if PersistentTaskGroup could somehow use a dictionary to enable faster lookups and track tasks by using a user-defined type instead of relying on a set of tasks and the task names.

Yeah, I actually went through the same reasoning, and thought too about being able to attach some metadata to the task, but it feels like this is not the place to put that metadata, you can always create the dict yourself for that. In that case for me, for example, I finally didn't even need to attach the metadata by just thinking about the problem differently.

We can see and if in the future we see more uses cases for attaching metadata, we could add it, but for now I would leave it out.

@daniel-zullo-frequenz
Copy link

but it feels like this is not the place to put that metadata, you can always create the dict yourself for that. In that case for me, for example, I finally didn't even need to attach the metadata by just thinking about the problem differently.
We can see and if in the future we see more uses cases for attaching metadata, we could add it, but for now I would leave it out.

Agree. It is clear to me now after checking your example. I admit that's not how I initially thought about it 👼
So please discard my wondering about adding the metadata

Marenz
Marenz previously approved these changes Aug 9, 2024
Copy link
Contributor

@Marenz Marenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment

llucax added 3 commits August 9, 2024 13:24
`asyncio.TaskGroup` is a very convenient construct when using
parallelization for doing calculations for example, where the results
for all the tasks need to be merged together to produce a final result.
In this case if one of the tasks fails, it makes sense to cancel the
others and abort as soon as possible, as any further calculations would
be thrown away.

This class is intended to help managing a group of tasks that should
persist even if other tasks in the group fail, usually by either only
discarding the failed task or by restarting it somehow.

Signed-off-by: Leandro Lucarella <[email protected]>
This is revealing too much internal information about how a service can
be implemented, it makes more sense to expose it only to classes
implementing a service, but not to service users.

Signed-off-by: Leandro Lucarella <[email protected]>
The service base class now delegates all sub-task management to a
`PersistentTaskGroup`, and has a special task called `main()` that
drives the service.

Writing single-task services should be much easier now, as only a
`main()` method needs to be implemented.

For more complex, multi-task, services, the internal task group can be
used to monitor sub-tasks, for example by using
`task_group.as_completed()`.

Signed-off-by: Leandro Lucarella <[email protected]>
*,
name: str | None = None,
context: contextvars.Context | None = None,
log_exception: bool = True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about this, if it is easy to check for completed tasks to handle errors, maybe we don't need the logging feature (or we'll get most code littered with log_exceptions=False). But if anyone forgets, then we are screwed again with task dying without notice.

I would love to find a way to log only if the task was not ACKed somehow, but I couldn't so far.

I guess leaving it for now is the safest bet, if people need to opt-out explicitly at least there are much bigger chances that they will handle the exceptions. And for a casual user that doesn't know how to do it right, at least we have a log message.

@llucax
Copy link
Contributor Author

llucax commented Aug 9, 2024

Will merge to keep moving, but if anyone has any more comments or feedback, please comment and I can change in future PRs.

@llucax llucax added this pull request to the merge queue Aug 9, 2024
Merged via the queue into frequenz-floss:v1.x.x with commit 7f60548 Aug 9, 2024
@llucax llucax deleted the service-main-task branch August 9, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmd:skip-release-notes It is not necessary to update release notes for this PR part:asyncio Affects the asyncio module part:tests Affects the unit, integration and performance (benchmarks) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants