Skip to content

Wait on all background tasks to finish (or abort) #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 14, 2025

Conversation

tnull
Copy link
Collaborator

@tnull tnull commented Aug 14, 2025

Fixes #611.
Possibly fixes #586.

Previously, we'd only wait for the background processor tasks to successfully finish. It turned out that this could lead to races when the other background tasks took too long to shutdown. Here, we attempt to wait on all background tasks shutting down for a bit, before moving on.

To allow clean exit of the transaction broadcasting task, we also include two refactoring commits (which are also a nice cleanup on their own right though).

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 14, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull marked this pull request as draft August 14, 2025 12:55
@tnull
Copy link
Collaborator Author

tnull commented Aug 14, 2025

Grrr, as usual nothing is easy with tokio. Drafting again until I figured it out.

tnull added 3 commits August 14, 2025 15:09
.. as tokio tends to panic if dropping a runtime in an async context and
we're not super careful. Here, we add some test coverage for this edge
case in Rust tests.
Previously, individual chain sources would hold references to the
`Broadcaster` to acquire the broadcast queue. Here, we move this to
`ChainSource`, which allows us to handle the queue in a single place,
while the individual chain sources will deal with the actual packages
only.
Rather than looping in the `spawn` method directly, we move the loop to
a refactored `continuously_process_broadcast_queue` method on
`ChainSource`, which also allows us to react on the stop signal if we're
polling `recv`.
@tnull tnull force-pushed the 2025-08-shutdown-wait-on-all-tasks branch from ec5229c to d4e7727 Compare August 14, 2025 13:44
@tnull
Copy link
Collaborator Author

tnull commented Aug 14, 2025

Now rebased this branch on top of #543 which solves the 'dropping a runtime in an async context' issue. Seems we finally have to move forward with that first.

@tnull tnull force-pushed the 2025-08-shutdown-wait-on-all-tasks branch from d4e7727 to bc6ea9b Compare August 14, 2025 13:50
@tnull tnull moved this to Goal: Merge in Weekly Goals Aug 14, 2025
@tnull tnull self-assigned this Aug 14, 2025
@tnull tnull added the weekly goal Someone wants to land this this week label Aug 14, 2025
@tnull tnull mentioned this pull request Aug 14, 2025
@tnull tnull force-pushed the 2025-08-shutdown-wait-on-all-tasks branch from bc6ea9b to 7ec7f29 Compare August 14, 2025 14:54
@tnull
Copy link
Collaborator Author

tnull commented Aug 14, 2025

Rebased on main again with a slight fix to avoid the previous tokio-panic-on-drop. Should be good to land independently now.

@tnull tnull marked this pull request as ready for review August 14, 2025 14:55
Copy link

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, reviewed at #613 (comment)

@tnull tnull force-pushed the 2025-08-shutdown-wait-on-all-tasks branch 2 times, most recently from 2bf7920 to 29e02a0 Compare August 14, 2025 18:11
Previously, we'd only wait for the background processor tasks to
successfully finish. It turned out that this could lead to races when
the other background tasks took too long to shutdown. Here, we attempt
to wait on all background tasks shutting down for a bit, before moving
on.
@tnull tnull force-pushed the 2025-08-shutdown-wait-on-all-tasks branch from 29e02a0 to 17a45dd Compare August 14, 2025 18:25
@tnull tnull requested a review from TheBlueMatt August 14, 2025 18:31
let runtime_2 = Arc::clone(&runtime);
tasks.abort_all();
tokio::task::block_in_place(move || {
runtime_2.block_on(async { while let Some(_) = tasks.join_next().await {} })

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there's a join_all we can call, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I avoided using join_all in this PR as it may panic if any of the tasks return an error. From the tokio docs:

If any tasks on the JoinSet fail with an JoinError, then this call to join_all will panic and all remaining tasks on the JoinSet are cancelled.

Very slim chances to ever hit this, but I'd still prefer not to panic ever.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that, indeed, we should avoid it.

@tnull tnull merged commit 5d2092b into lightningdevkit:main Aug 14, 2025
15 checks passed
@github-project-automation github-project-automation bot moved this from Goal: Merge to Done in Weekly Goals Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
weekly goal Someone wants to land this this week
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Reconnection doesn't stop after disconnect-all-peers Shutdown gets stuck when there is a pending payment
3 participants