Skip to content

Integrate Splicing with Quiescence #4007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

Mostly small tweaks to our quiescence logic, but ultimately integrates splicing with quiescence.

IMO its important that we allow quiescence-init while disconnected, but the only awkward part of the API is we can't cancel a splice once its started (except by FC'ing the channel). I started writing a cancel API but then realized its not possible because its only actually cancelled once the ChannelManager is persisted which may be a while. The other option we could consider is dropping the pending splice on restart and giving the user an API to see whether a splice happened or not (I guess its via the ChannelReady event?) and telling them to start again.

WDYT @wpaulino and @jkczyz?

In the case where we prepare to initiate quiescence, but cannot yet
send our `stfu` because we're waiting on some channel operations to
settle, and our peer ultimately sends their `stfu` before we can,
we would detect this case and, if we were able, send an `stfu`
which would allow us to send "something fundamental" first.

While this is a nifty optimization, its a bit overkill - the chance
that both us and our peer decide to attempt something fundamental
at the same time is pretty low, and worse this required additional
state tracking.

We simply remove this optimization here, simplifying the quiescence
state machine a good bit.
When we initiate quiescence, it should always be because we're
trying to accomplish something (in the short term only splicing).
In order to actually do that thing, we need to store the
instructions for that thing somewhere the splicing logic knows to
look at once we reach quiescence.

Here we add a simple enum which will eventually store such actions.
There are a number of things in LDK where we've been lazy and not
allowed the user to initiate an action while a peer is
disconnected. While it may be accurate in the sense that the action
cannot be started while the peer is disconnected, it is terrible
dev UX - these actions can fail without the developer being at
fault and the only way for them to address it is just try again.

Here we fix this dev UX shortcoming for splicing, keeping any
queued post-quiescent actions around when a peer disconnects and
retrying the action (and quiescence generally) when the peer
reconnects.
Now that we have a `QuiescentAction` to track what we intend to do
once we reach quiescence, we need to use it to initiate splices.

Here we do so, adding a new `SpliceInstructions` to track the
arguments that are currently passed to `splice_channel`. While
these may not be exactly the right arguments to track in the end,
a lot of the splice logic is still in flight, so we can worry about
it later.
While we have a test to disconnect a peer if we're waiting on an
`stfu` message, we also disconnect if we've reached quiescence but
we're waiting on a peer to do "something fundamental" and they take
too long to do so. We test that behavior here.
@TheBlueMatt TheBlueMatt requested a review from wpaulino August 13, 2025 01:34
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 13, 2025

👋 Thanks for assigning @jkczyz as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 86.45833% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.90%. Comparing base (6a4169d) to head (76210a8).

Files with missing lines Patch % Lines
lightning/src/ln/channel.rs 79.03% 12 Missing and 1 partial ⚠️
lightning/src/ln/channelmanager.rs 63.15% 6 Missing and 1 partial ⚠️
lightning/src/ln/quiescence_tests.rs 94.59% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4007      +/-   ##
==========================================
- Coverage   88.90%   88.90%   -0.01%     
==========================================
  Files         174      174              
  Lines      125114   125239     +125     
  Branches   125114   125239     +125     
==========================================
+ Hits       111234   111338     +104     
- Misses      11364    11390      +26     
+ Partials     2516     2511       -5     
Flag Coverage Δ
fuzzing 22.11% <6.17%> (-0.02%) ⬇️
tests 88.72% <86.45%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wpaulino
Copy link
Contributor

IMO its important that we allow quiescence-init while disconnected, but the only awkward part of the API is we can't cancel a splice once its started (except by FC'ing the channel). I started writing a cancel API but then realized it's not possible because it's only actually cancelled once the ChannelManager is persisted which may be a while. The other option we could consider is dropping the pending splice on restart and giving the user an API to see whether a splice happened or not (I guess it's via the ChannelReady event?) and telling them to start again.

I was envisioning we'd have a SplicePending event that gets emitted after the tx_signatures exchange, and a SpliceFailed event for any failed/aborted attempts prior to said exchange. I think we could support a cancel API (either disconnect or send tx_abort) until the user calls back with funding_transaction_signed and rely on the SpliceFailed event as the response, but canceling after funding_transaction_signed wouldn't be possible so we'd have to force close.

Supporting quiescence throughout disconnects makes sense, but we may want to have a fixed number of retries or a timer so that we don't go on forever?

@TheBlueMatt
Copy link
Collaborator Author

I was envisioning we'd have a SplicePending event that gets emitted after the tx_signatures exchange, and a SpliceFailed event for any failed/aborted attempts prior to said exchange. I think we could support a cancel API (either disconnect or send tx_abort) until the user calls back with funding_transaction_signed and rely on the SpliceFailed event as the response, but canceling after funding_transaction_signed wouldn't be possible so we'd have to force close.

This still has the persistence issue, though. The user could track that they no longer want a splice in a channel and refuse to sign, which always works (and maybe is simply what we should do and document that users should rely on it) but we can't provide a built-in "cancel" function that actually is guaranteed to cancel. I think maybe this is fine, though, we don't really need to provide a way to cancel upgrading a channel type (just let it happen when the peer finally comes online?), so the other use of quiescence is fine, I guess?

Supporting quiescence throughout disconnects makes sense, but we may want to have a fixed number of retries or a timer so that we don't go on forever?

You mean because the other end may be rejecting the splice for some reason (why would they do that?)?

@wpaulino
Copy link
Contributor

The user could track that they no longer want a splice in a channel and refuse to sign, which always works (and maybe is simply what we should do and document that users should rely on it) but we can't provide a built-in "cancel" function that actually is guaranteed to cancel.

Even then, I don't think it's possible to cancel because the counterparty may have already provided its tx_signatures and is now expecting ours. The channel would remain quiescent until they're exchanged. If we want to support this, I think we'd have to block sending our commitment_signed until the user calls back with funding_transaction_signed? It's safe/possible to abort the negotiation while commitment_signed hasn't been exchanged.

You mean because the other end may be rejecting the splice for some reason (why would they do that?)?

Could be that the peer is not around long enough to finish the negotiation before disconnecting again. Or maybe they require confirmed inputs (we don't support this at the moment), and we keep providing an unconfirmed one.

@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Aug 14, 2025

Even then, I don't think it's possible to cancel because the counterparty may have already provided its tx_signatures and is now expecting ours. The channel would remain quiescent until they're exchanged. If we want to support this, I think we'd have to block sending our commitment_signed until the user calls back with funding_transaction_signed? It's safe/possible to abort the negotiation while commitment_signed hasn't been exchanged.

Hmmmmmm. Maybe we do that then? Not supporting splice init during disconnection seems like a pretty terrible API but also not supporting cancel wouldn't be an option. This is more complicated (storing the user's signatures for later use, I guess, though we already have to support not sending CS until the monitor completes) but it seems not crazy bad for a much better API. It also doesn't have to happen to enable splicing and get started testing, just to ship.

Could be that the peer is not around long enough to finish the negotiation before disconnecting again.

I imagine in this case we want to keep retrying until it succeeds or the user cancels.

Or maybe they require confirmed inputs (we don't support this at the moment), and we keep providing an unconfirmed one.

Should the peer not send a TxAbort in this case and we'll return the channel to normal operation? Or are they supposed to send a warning and disconnect?

@TheBlueMatt
Copy link
Collaborator Author

In any case, note that the quiesence action is taken, so we'll actually only ever try any action once. If it fails we let it fail and won't retry. I suppose in the "peer disconnected due to network disruption" case we'd really like to retry but I'm not sure its worth trying to implement that and track a failure counter.

@TheBlueMatt TheBlueMatt self-assigned this Aug 14, 2025
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

/// This keeps track of that action. Note that if we become quiescent and we're not the
/// initiator we may be able to merge this action into what the counterparty wanted to do (e.g.
/// in the case of splicing).
post_quiescence_action: Option<QuiescentAction>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "post" makes it seem like something we'd do after quiescence and not while

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it only applies to funded channels, it could maybe go there instead of ChannelContext (unless we need it in some ChannelContext method)

@@ -2443,13 +2443,42 @@ impl PendingSplice {
}
}

pub(crate) struct SpliceInstructions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, this is all changing very soon with #3979, can we just pull this commit out and rebase it on top of that?

@ldk-reviews-bot
Copy link

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

@wpaulino wpaulino requested a review from jkczyz August 14, 2025 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants