Skip to content

Conversation

TheBlueMatt
Copy link
Collaborator

Reading `ChannelMonitor`s on startup is one of the slowest parts of
LDK initialization. Now that we have an async `KVStore`, there's no
need for that, we can simply paralellize their loading, which we do
here.

Sadly, because Rust futures are pretty unergonomic, we have to add
some `unsafe {}` here, but arguing its fine is relatively
straightforward.

Based on #4146 just cause I don't want to resolve conflicts.

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Oct 7, 2025

I've assigned @jkczyz as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link

codecov bot commented Oct 7, 2025

Codecov Report

❌ Patch coverage is 48.80000% with 64 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.71%. Comparing base (eeb5b14) to head (0a2547a).

Files with missing lines Patch % Lines
lightning/src/util/persist.rs 35.38% 37 Missing and 5 partials ⚠️
lightning/src/util/native_async.rs 56.25% 14 Missing ⚠️
lightning-block-sync/src/gossip.rs 0.00% 5 Missing ⚠️
lightning/src/util/async_poll.rs 86.95% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4147      +/-   ##
==========================================
- Coverage   88.73%   88.71%   -0.03%     
==========================================
  Files         180      180              
  Lines      135622   135739     +117     
  Branches   135622   135739     +117     
==========================================
+ Hits       120346   120416      +70     
- Misses      12505    12554      +49     
+ Partials     2771     2769       -2     
Flag Coverage Δ
fuzzing 21.69% <0.00%> (-0.03%) ⬇️
tests 88.55% <48.80%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ldk-reviews-bot ldk-reviews-bot requested a review from jkczyz October 7, 2025 23:28
@TheBlueMatt TheBlueMatt force-pushed the 2025-10-parallel-reads branch from 4a773c9 to 31fbd16 Compare October 8, 2025 10:54
@TheBlueMatt TheBlueMatt marked this pull request as draft October 8, 2025 11:40
@TheBlueMatt TheBlueMatt removed the request for review from jkczyz October 8, 2025 11:40
@TheBlueMatt
Copy link
Collaborator Author

This doesn't actually work. monitor loading is (at least for a semi-local disk) CPU-bound, so we really need to spawn each monitor load task rather than having one task.

@TheBlueMatt TheBlueMatt force-pushed the 2025-10-parallel-reads branch 3 times, most recently from 64412a5 to 2c807b9 Compare October 9, 2025 17:58
Reading `ChannelMonitor`s on startup is one of the slowest parts of
LDK initialization. Now that we have an async `KVStore`, there's no
need for that, we can simply paralellize their loading, which we do
here.

Sadly, because Rust futures are pretty unergonomic, we have to add
some `unsafe {}` here, but arguing its fine is relatively
straightforward.
`tokio::spawn` can be use both to spawn a forever-running
background task or to spawn a task which gets `poll`ed
independently and eventually returns a result which the callsite
wants.

In LDK, we have only ever needed the first, and thus didn't bother
defining a return type for `FutureSpawner::spawn`. However, in the
next commit we'll start using `FutureSpawner` in a context where we
actually do want the spawned future's result. Thus, here, we add a
result output to `FutureSpawner::spawn`, mirroring the
`tokio::spawn` API.
`MonitorUpdatingPersister::read_all_channel_monitors_with_updates`
was made to do the IO operations in parallel in a previous commit,
however in practice this doesn't provide material parallelism for
large routing nodes. Because deserializing `ChannelMonitor`s is the
bulk of the work (when IO operations are sufficiently fast), we end
up blocked in single-threaded work nearly the entire time.

Here, we add an alternative option - a new
`read_all_channel_monitors_with_updates_parallel` method which uses
the `FutureSpawner` to cause the deserialization operations to
proceed in parallel.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on
startup, we have to make sure to load any `ChannelMonitorUpdate`s
and re-apply them as well. For users of async persistence who don't
have any `ChannelMonitorUpdate`s (e.g. because they set
`maximum_pending_updates` to 0 or, in the future, we avoid
persisting updates for small `ChannelMonitor`s), this means two
round-trips to the storage backend, one to load the
`ChannelMonitor` and one to try to read the next
`ChannelMonitorUpdate` only to have it fail.

Instead, here, we use `KVStore::list` to fetch the list of stored
`ChannelMonitorUpdate`s, which for async `KVStore` users allows us
to parallelize the list of update fetching and the
`ChannelMonitor` loading itself. Then we know exactly when to stop
reading `ChannelMonitorUpdate`s, including reading none if there
are none to read. This also avoids relying on `KVStore::read`
correctly returning `NotFound` in order to correctly discover when
to stop reading `ChannelMonitorUpdate`s.
When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on
startup, we have to make sure to load any `ChannelMonitorUpdate`s
and re-apply them as well. Now that we know which
`ChannelMonitorUpdate`s to load from `list`ing the entries from the
`KVStore` we can parallelize the reads themselves, which we do
here.

Now, loading all `ChannelMonitor`s from an async `KVStore` requires
only three full RTTs - one to list the set of `ChannelMonitor`s,
one to both fetch the `ChanelMonitor` and list the set of
`ChannelMonitorUpdate`s, and one to fetch all the
`ChannelMonitorUpdate`s (with the last one skipped when there are
no `ChannelMonitorUpdate`s to read).
@TheBlueMatt TheBlueMatt marked this pull request as ready for review October 9, 2025 19:54
@TheBlueMatt TheBlueMatt force-pushed the 2025-10-parallel-reads branch from 2c807b9 to 0a2547a Compare October 9, 2025 19:54
@TheBlueMatt
Copy link
Collaborator Author

This requires GATs which requires an MSRV bump (to 1.64), but we're planning on doing that soon anyway.

@ldk-reviews-bot ldk-reviews-bot requested a review from jkczyz October 9, 2025 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants