Skip to content

Conversation

TheBlueMatt
Copy link
Collaborator

In 7d856fe5b05d69b301fdb8a48352f3f3a16efca9 we moved to doing all of our persistence operations in the (async) background processor in parallel. This is great for slow persistence operations (eg with remote persistence). However, the ChannelManager persistence is very latency-sensitive (delays may lead to accidental force-closures today) and thus we don't really want to wait on network graph pruning or scorer time-stepping (which can be somewhat slow).

Instead, we keep the new bulk-polling of the persistence operations but immediately do a single step of the ChannelManager persistence future after it is created. This should give it a chance to get going - initializing whatever network sends or connections need to happen - before we get busy doing CPU-intensive stuff. While it may not actually make the persist happen in full, it at least gives it a chance, and should make as much progress as possible until a network resources is exhausted (eg send buffer is full), at which point we need to wait on the network which is almost certainly slow than the 1ms or so it will take to time-step the scorer.

In 7d856fe5b05d69b301fdb8a48352f3f3a16efca9 we moved to doing all
of our persistence operations in the (async) background processor
in parallel. This is great for slow persistence operations (eg with
remote persistence). However, the `ChannelManager` persistence is
very latency-sensitive (delays may lead to accidental
force-closures today) and thus we don't really want to wait on
network graph pruning or scorer time-stepping (which can be
somewhat slow).

Instead, we keep the new bulk-polling of the persistence operations
but immediately do a single step of the `ChannelManager`
persistence future after it is created. This should give it a
chance to get going - initializing whatever network sends or
connections need to happen - before we get busy doing CPU-intensive
stuff. While it may not actually make the persist happen in full,
it at least gives it a chance, and should make as much progress as
possible until a network resources is exhausted (eg send buffer is
full), at which point we need to wait on the network which is
almost certainly slow than the 1ms or so it will take to time-step
the scorer.
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Jul 31, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@TheBlueMatt TheBlueMatt requested a review from joostjager July 31, 2025 01:41
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

❌ Patch coverage is 88.88889% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 88.89%. Comparing base (664511b) to head (91bce83).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
lightning-background-processor/src/lib.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3977      +/-   ##
==========================================
- Coverage   88.91%   88.89%   -0.03%     
==========================================
  Files         174      174              
  Lines      124232   124240       +8     
  Branches   124232   124240       +8     
==========================================
- Hits       110455   110437      -18     
- Misses      11301    11320      +19     
- Partials     2476     2483       +7     
Flag Coverage Δ
fuzzing 22.61% <ø> (ø)
tests 88.71% <88.88%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// Because persisting the ChannelManager is important to avoid accidental
// force-closures, go ahead and poll the future once before we do slightly more
// CPU-intensive tasks in the form of NetworkGraph pruning or scorer time-stepping
// below. This will get it moving but won't block us for too long if the underlying
Copy link
Contributor

@joostjager joostjager Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is needed, shouldn't we stay on the safe side and await cm persistence separately? (not include it in the joiner)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - time-stepping the scorer takes something on the order of 1ms, so its going to be dwarfed by any network latency anyway. On the flip side, remote persistence may take something like 500ms if we have a bit high internet latency and there's a few RTTs. Thus, parallelizing the persistences across all the cases is really important for overall application latency, but at the same time we want to get things moving quick, so its worth polling once to get network stuff going.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To some extent this is really because we're trying to optimize for two drastically different cases - async persist with a local DB that is 0.5ms away and async persist with a remote VSS store that is 500ms away. But I do think this is basically a free way to optimize for both - there's no real cost to polling once, aside from a bit of extra code, and it lets us get ChannelManager persisted fast but doesn't stall the whole thing on net.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, buying this.

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@TheBlueMatt
Copy link
Collaborator Author

Just gonna land this, its simple and straightforward.

@TheBlueMatt TheBlueMatt merged commit 7e68f54 into lightningdevkit:main Jul 31, 2025
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants