Skip to content

Backports for 0.6.2 #613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 14, 2025

Conversation

tnull
Copy link
Collaborator

@tnull tnull commented Aug 14, 2025

These are backports of #592 and #612 for the upcoming 0.6.2 release.

tnull added 3 commits August 14, 2025 16:22
Previously, we had to configure enormous syncing timeouts as the BDK
wallet syncing would hold a central mutex that could lead to large parts
of event handling and syncing locking up. Here, we drop the configured
timeouts considerably across the board, since such huge values are
hopefully not required anymore.
Previously, we used to a channel to indicate that the background
processor task has been stopped. Here, we rather just await the task's
`JoinHandle` which is more robust in that it avoids a race condition.
.. we provide finer-grained logging after each step of `stop`.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 14, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull requested a review from TheBlueMatt August 14, 2025 14:52
@tnull
Copy link
Collaborator Author

tnull commented Aug 14, 2025

Now also included #590 to fix the CLN CI on this branch, too.

loop {
let timeout_fut = tokio::time::timeout(
Duration::from_secs(BACKGROUND_TASK_SHUTDOWN_TIMEOUT_SECS),
tasks.join_next_with_id(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than letting each of these just take their time, IMO we should have a JoinSet for things that we can cancel right away and the ones we have to join. That way we can just call abort_all for the tasks that we really don't want to wait around on for no reason.

Going through the callers we have

  • continuously_sync_wallets - not clear that this is cancel-safe? I assume not cause its all BDK stuff and they don't guarantee that?
  • RGS sync - clearly can just be cancelled. There's no need to wait for a timeout
  • inbound TCP acceptor - clearly can just be cancelled, though it should reliably exit fast
  • reconnect loop - clearly can just be cancelled, and it might be important because the connect logic loops until the peer finishes its handshake
  • node announce - clearly we can just cancel.
  • liquidity handler - probably not cancel-safe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a stop signal for all of these, which should be very similar to aborting for most of the cited cases, as ~all of them should be selecting ~instantly on the signal receiver.

Note that most of them also perform IO, if it's just to update NodeMetrics after they're done (e.g. RGS, node ann.). So, for one some of these tasks might simply not be cancellable (as you can't cancel IOPS/spawn_blocking), but I think it's cleaner/easier to reason about if we simply let them reach the next await point.

If you find it much preferable I can add a separate set for the TCP tasks, but not entirely convinced it's worth it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a stop signal for all of these, which should be very similar to aborting for most of the cited cases, as ~all of them should be selecting ~instantly on the signal receiver.

This is at least not true for the background connection task. It has a sleep in the loop but it isn't selected with the exit task. If it happens to be connecting that guarantees we'll wait like a few RTTs to exit, but if the peer is hung it could be a lot longer. Rather than trying to pipe the exit signal through everywhere it seems much simpler to just cancel the tasks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now split the TCP tasks out.

src/lib.rs Outdated
@@ -661,6 +684,53 @@ impl Node {
self.chain_source.stop();
log_debug!(self.logger, "Stopped chain sources.");

// Cancel cancellable background tasks
if let Some(mut tasks) = self.cancellable_background_tasks.lock().unwrap().take() {
tasks.abort_all();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still want to join them even after we abort, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they may have been in sync code at the time we called abort_all and we want to wait until they reach an await point so that we know they're actually not running. Its not all that critical, but in theory the connection logic is still race-y I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now doing so.

@tnull tnull force-pushed the 2025-08-backports-for-0.6.2 branch from dbe11e8 to a0be1a5 Compare August 14, 2025 18:09
src/lib.rs Outdated
log_debug!(self.logger, "Stopped chain sources.");

// Cancel cancellable background tasks
if let Some(mut tasks) = self.cancellable_background_tasks.lock().unwrap().take() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALso doesn't this need to happen after disconnect_all_peers, which is a few lines up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean before? For the TCP tasks yes. Moved up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, thanks.

tnull and others added 3 commits August 14, 2025 20:23
Previously, we'd only wait for the background processor tasks to
successfully finish. It turned out that this could lead to races when
the other background tasks took too long to shutdown. Here, we attempt
to wait on all background tasks shutting down for a bit, before moving
on.
.. as tokio tends to panic if dropping a runtime in an async context and
we're not super careful. Here, we add some test coverage for this edge
case in Rust tests.
@tnull tnull force-pushed the 2025-08-backports-for-0.6.2 branch from a0be1a5 to 1a2d900 Compare August 14, 2025 18:23
@tnull tnull merged commit 52e8f6f into lightningdevkit:release/0.6 Aug 14, 2025
9 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants