Skip to content

Commit 5a5c708

Browse files
committed
Prepare to provide new ReleasePaymentComplete monitor updates
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we prepare to generate the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, adding a new `PaymentCompleteUpdate` struct to track the new update before we generate the `ChannelMonitorUpdate` and passing through to the right places in `ChannelManager`. The only cases where we want to generate the new update is after a `PaymentSent` or `PaymentFailed` event when the event was the result of a `MonitorEvent` or the equivalent read during startup.
1 parent 94bce4a commit 5a5c708

File tree

2 files changed

+90
-29
lines changed

2 files changed

+90
-29
lines changed

lightning/src/ln/channelmanager.rs

Lines changed: 82 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,21 @@ impl_writeable_tlv_based_enum_upgradable!(MonitorUpdateCompletionAction,
12331233
},
12341234
);
12351235

1236+
#[derive(Clone, Debug, PartialEq, Eq)]
1237+
pub(crate) struct PaymentCompleteUpdate {
1238+
counterparty_node_id: PublicKey,
1239+
channel_funding_outpoint: OutPoint,
1240+
channel_id: ChannelId,
1241+
htlc_id: SentHTLCId,
1242+
}
1243+
1244+
impl_writeable_tlv_based!(PaymentCompleteUpdate, {
1245+
(1, channel_funding_outpoint, required),
1246+
(3, counterparty_node_id, required),
1247+
(5, channel_id, required),
1248+
(7, htlc_id, required),
1249+
});
1250+
12361251
#[derive(Clone, Debug, PartialEq, Eq)]
12371252
pub(crate) enum EventCompletionAction {
12381253
ReleaseRAAChannelMonitorUpdate {
@@ -3448,7 +3463,7 @@ macro_rules! handle_monitor_update_completion {
34483463
$self.finalize_claims(updates.finalized_claimed_htlcs);
34493464
for failure in updates.failed_htlcs.drain(..) {
34503465
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(counterparty_node_id), channel_id };
3451-
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver);
3466+
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver, None);
34523467
}
34533468
} }
34543469
}
@@ -4121,7 +4136,8 @@ where
41214136
let failure_reason = LocalHTLCFailureReason::ChannelClosed;
41224137
let reason = HTLCFailReason::from_failure_code(failure_reason);
41234138
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(*counterparty_node_id), channel_id: *chan_id };
4124-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
4139+
let (source, hash) = htlc_source;
4140+
self.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, None);
41254141
}
41264142

41274143
let _ = handle_error!(self, shutdown_result, *counterparty_node_id);
@@ -4255,7 +4271,7 @@ where
42554271
let failure_reason = LocalHTLCFailureReason::ChannelClosed;
42564272
let reason = HTLCFailReason::from_failure_code(failure_reason);
42574273
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(counterparty_node_id), channel_id };
4258-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
4274+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
42594275
}
42604276
if let Some((_, funding_txo, _channel_id, monitor_update)) = shutdown_res.monitor_update {
42614277
debug_assert!(false, "This should have been handled in `locked_close_channel`");
@@ -6159,7 +6175,8 @@ where
61596175

61606176
let reason = HTLCFailReason::from_failure_code(LocalHTLCFailureReason::UnknownNextPeer);
61616177
let destination = HTLCHandlingFailureType::InvalidForward { requested_forward_scid: short_channel_id };
6162-
self.fail_htlc_backwards_internal(&htlc_source, &payment.forward_info.payment_hash, &reason, destination);
6178+
let hash = payment.forward_info.payment_hash;
6179+
self.fail_htlc_backwards_internal(&htlc_source, &hash, &reason, destination, None);
61636180
} else { unreachable!() } // Only `PendingHTLCRouting::Forward`s are intercepted
61646181

61656182
Ok(())
@@ -6448,6 +6465,7 @@ where
64486465
&payment_hash,
64496466
&failure_reason,
64506467
destination,
6468+
None,
64516469
);
64526470
}
64536471
self.forward_htlcs(&mut phantom_receives);
@@ -7657,7 +7675,7 @@ where
76577675
let failure_reason = LocalHTLCFailureReason::MPPTimeout;
76587676
let reason = HTLCFailReason::from_failure_code(failure_reason);
76597677
let receiver = HTLCHandlingFailureType::Receive { payment_hash: htlc_source.1 };
7660-
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver);
7678+
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver, None);
76617679
}
76627680

76637681
for (err, counterparty_node_id) in handle_errors {
@@ -7724,7 +7742,7 @@ where
77247742
let reason = self.get_htlc_fail_reason_from_failure_code(failure_code, &htlc);
77257743
let source = HTLCSource::PreviousHopData(htlc.prev_hop);
77267744
let receiver = HTLCHandlingFailureType::Receive { payment_hash: *payment_hash };
7727-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
7745+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
77287746
}
77297747
}
77307748
}
@@ -7817,7 +7835,7 @@ where
78177835
node_id: Some(counterparty_node_id.clone()),
78187836
channel_id,
78197837
};
7820-
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver);
7838+
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver, None);
78217839
}
78227840
}
78237841

@@ -7826,6 +7844,7 @@ where
78267844
fn fail_htlc_backwards_internal(
78277845
&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason,
78287846
failure_type: HTLCHandlingFailureType,
7847+
mut from_monitor_update_completion: Option<PaymentCompleteUpdate>,
78297848
) {
78307849
// Ensure that no peer state channel storage lock is held when calling this function.
78317850
// This ensures that future code doesn't introduce a lock-order requirement for
@@ -7857,7 +7876,17 @@ where
78577876
&self.secp_ctx,
78587877
&self.pending_events,
78597878
&self.logger,
7879+
&mut from_monitor_update_completion,
78607880
);
7881+
if let Some(update) = from_monitor_update_completion {
7882+
// If `fail_htlc` didn't `take` the post-event action, we should go ahead and
7883+
// complete it here as the failure was duplicative - we've already handled it.
7884+
// This should mostly only happen on startup, but it is possible to hit it in
7885+
// rare cases where a MonitorUpdate is replayed after restart because a
7886+
// ChannelMonitor wasn't persisted after it was applied (even though the
7887+
// ChannelManager was).
7888+
// TODO
7889+
}
78617890
},
78627891
HTLCSource::PreviousHopData(HTLCPreviousHopData {
78637892
ref short_channel_id,
@@ -7995,6 +8024,7 @@ where
79958024
&payment_hash,
79968025
&reason,
79978026
receiver,
8027+
None,
79988028
);
79998029
}
80008030
return;
@@ -8142,7 +8172,7 @@ where
81428172
err_data,
81438173
);
81448174
let receiver = HTLCHandlingFailureType::Receive { payment_hash };
8145-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
8175+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
81468176
}
81478177
self.claimable_payments.lock().unwrap().pending_claiming_payments.remove(&payment_hash);
81488178
}
@@ -8482,22 +8512,35 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
84828512
debug_assert!(self.background_events_processed_since_startup.load(Ordering::Acquire),
84838513
"We don't support claim_htlc claims during startup - monitors may not be available yet");
84848514
debug_assert_eq!(next_channel_counterparty_node_id, path.hops[0].pubkey);
8485-
let ev_completion_action = EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
8486-
channel_funding_outpoint: next_channel_outpoint,
8487-
channel_id: next_channel_id,
8488-
counterparty_node_id: path.hops[0].pubkey,
8489-
};
8515+
let mut ev_completion_action =
8516+
Some(EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
8517+
channel_funding_outpoint: next_channel_outpoint,
8518+
channel_id: next_channel_id,
8519+
counterparty_node_id: path.hops[0].pubkey,
8520+
});
84908521
self.pending_outbound_payments.claim_htlc(
84918522
payment_id,
84928523
payment_preimage,
84938524
bolt12_invoice,
84948525
session_priv,
84958526
path,
84968527
from_onchain,
8497-
ev_completion_action,
8528+
&mut ev_completion_action,
84988529
&self.pending_events,
84998530
&self.logger,
85008531
);
8532+
// If an event was generated, `claim_htlc` set `ev_completion_action` to None, if
8533+
// not, we should go ahead and run it now (as the claim was duplicative), at least
8534+
// if a PaymentClaimed event with the same action isn't already pending.
8535+
let have_action = if ev_completion_action.is_some() {
8536+
let pending_events = self.pending_events.lock().unwrap();
8537+
pending_events.iter().any(|(_, act)| *act == ev_completion_action)
8538+
} else {
8539+
false
8540+
};
8541+
if !have_action {
8542+
self.handle_post_event_actions(ev_completion_action);
8543+
}
85018544
},
85028545
HTLCSource::PreviousHopData(hop_data) => {
85038546
let prev_channel_id = hop_data.channel_id;
@@ -10030,7 +10073,8 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1003010073
channel_id: msg.channel_id,
1003110074
};
1003210075
let reason = HTLCFailReason::from_failure_code(LocalHTLCFailureReason::ChannelClosed);
10033-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
10076+
let (source, hash) = htlc_source;
10077+
self.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, None);
1003410078
}
1003510079

1003610080
Ok(())
@@ -10508,6 +10552,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1050810552
&payment_hash,
1050910553
&failure_reason,
1051010554
destination,
10555+
None,
1051110556
);
1051210557
}
1051310558

@@ -11101,6 +11146,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1110111146
&htlc_update.payment_hash,
1110211147
&reason,
1110311148
receiver,
11149+
None,
1110411150
);
1110511151
}
1110611152
},
@@ -12623,8 +12669,8 @@ where
1262312669
}
1262412670
}
1262512671

12626-
fn handle_post_event_actions(&self, actions: Vec<EventCompletionAction>) {
12627-
for action in actions {
12672+
fn handle_post_event_actions<I: IntoIterator<Item = EventCompletionAction>>(&self, actions: I) {
12673+
for action in actions.into_iter() {
1262812674
match action {
1262912675
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1263012676
channel_funding_outpoint,
@@ -13479,7 +13525,7 @@ where
1347913525
}
1348013526

1348113527
for (source, payment_hash, reason, destination) in timed_out_htlcs.drain(..) {
13482-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination);
13528+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination, None);
1348313529
}
1348413530
}
1348513531

@@ -15684,7 +15730,7 @@ where
1568415730
}
1568515731
for (source, hash, cp_id, chan_id) in shutdown_result.dropped_outbound_htlcs {
1568615732
let reason = LocalHTLCFailureReason::ChannelClosed;
15687-
failed_htlcs.push((source, hash, cp_id, chan_id, reason));
15733+
failed_htlcs.push((source, hash, cp_id, chan_id, reason, None));
1568815734
}
1568915735
channel_closures.push_back((
1569015736
events::Event::ChannelClosed {
@@ -15728,6 +15774,7 @@ where
1572815774
channel.context.get_counterparty_node_id(),
1572915775
channel.context.channel_id(),
1573015776
LocalHTLCFailureReason::ChannelClosed,
15777+
None,
1573115778
));
1573215779
}
1573315780
}
@@ -16428,20 +16475,21 @@ where
1642816475
// generating a `PaymentPathSuccessful` event but regenerating
1642916476
// it and the `PaymentSent` on every restart until the
1643016477
// `ChannelMonitor` is removed.
16431-
let compl_action =
16478+
let mut compl_action = Some(
1643216479
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1643316480
channel_funding_outpoint: monitor.get_funding_txo(),
1643416481
channel_id: monitor.channel_id(),
1643516482
counterparty_node_id: path.hops[0].pubkey,
16436-
};
16483+
},
16484+
);
1643716485
pending_outbounds.claim_htlc(
1643816486
payment_id,
1643916487
preimage,
1644016488
bolt12_invoice,
1644116489
session_priv,
1644216490
path,
1644316491
false,
16444-
compl_action,
16492+
&mut compl_action,
1644516493
&pending_events,
1644616494
&&logger,
1644716495
);
@@ -16456,12 +16504,20 @@ where
1645616504
"Failing HTLC with payment hash {} as it was resolved on-chain.",
1645716505
htlc.payment_hash
1645816506
);
16507+
let completion_action = Some(PaymentCompleteUpdate {
16508+
counterparty_node_id: monitor.get_counterparty_node_id(),
16509+
channel_funding_outpoint: monitor.get_funding_txo(),
16510+
channel_id: monitor.channel_id(),
16511+
htlc_id: SentHTLCId::from_source(&htlc_source),
16512+
});
16513+
1645916514
failed_htlcs.push((
1646016515
htlc_source,
1646116516
htlc.payment_hash,
1646216517
monitor.get_counterparty_node_id(),
1646316518
monitor.channel_id(),
1646416519
LocalHTLCFailureReason::OnChainTimeout,
16520+
completion_action,
1646516521
));
1646616522
}
1646716523
}
@@ -17147,11 +17203,13 @@ where
1714717203
}
1714817204

1714917205
for htlc_source in failed_htlcs {
17150-
let (source, payment_hash, counterparty_id, channel_id, failure_reason) = htlc_source;
17206+
let (source, hash, counterparty_id, channel_id, failure_reason, ev_action) =
17207+
htlc_source;
1715117208
let receiver =
1715217209
HTLCHandlingFailureType::Forward { node_id: Some(counterparty_id), channel_id };
1715317210
let reason = HTLCFailReason::from_failure_code(failure_reason);
17154-
channel_manager.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
17211+
channel_manager
17212+
.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, ev_action);
1715517213
}
1715617214

1715717215
for (

lightning/src/ln/outbound_payment.rs

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ use lightning_invoice::Bolt11Invoice;
1717
use crate::blinded_path::{IntroductionNode, NodeIdLookUp};
1818
use crate::events::{self, PaidBolt12Invoice, PaymentFailureReason};
1919
use crate::ln::channel_state::ChannelDetails;
20-
use crate::ln::channelmanager::{EventCompletionAction, HTLCSource, PaymentId};
20+
use crate::ln::channelmanager::{
21+
EventCompletionAction, HTLCSource, PaymentCompleteUpdate, PaymentId,
22+
};
2123
use crate::ln::onion_utils;
2224
use crate::ln::onion_utils::{DecodedOnionFailure, HTLCFailReason};
2325
use crate::offers::invoice::Bolt12Invoice;
@@ -2160,7 +2162,7 @@ impl OutboundPayments {
21602162
#[rustfmt::skip]
21612163
pub(super) fn claim_htlc<L: Deref>(
21622164
&self, payment_id: PaymentId, payment_preimage: PaymentPreimage, bolt12_invoice: Option<PaidBolt12Invoice>,
2163-
session_priv: SecretKey, path: Path, from_onchain: bool, ev_completion_action: EventCompletionAction,
2165+
session_priv: SecretKey, path: Path, from_onchain: bool, ev_completion_action: &mut Option<EventCompletionAction>,
21642166
pending_events: &Mutex<VecDeque<(events::Event, Option<EventCompletionAction>)>>,
21652167
logger: &L,
21662168
) where L::Target: Logger {
@@ -2181,7 +2183,7 @@ impl OutboundPayments {
21812183
amount_msat,
21822184
fee_paid_msat,
21832185
bolt12_invoice: bolt12_invoice,
2184-
}, Some(ev_completion_action.clone())));
2186+
}, ev_completion_action.take()));
21852187
payment.get_mut().mark_fulfilled();
21862188
}
21872189

@@ -2199,7 +2201,7 @@ impl OutboundPayments {
21992201
payment_hash,
22002202
path,
22012203
hold_times: Vec::new(),
2202-
}, Some(ev_completion_action)));
2204+
}, ev_completion_action.take()));
22032205
}
22042206
}
22052207
} else {
@@ -2328,7 +2330,7 @@ impl OutboundPayments {
23282330
path: &Path, session_priv: &SecretKey, payment_id: &PaymentId,
23292331
probing_cookie_secret: [u8; 32], secp_ctx: &Secp256k1<secp256k1::All>,
23302332
pending_events: &Mutex<VecDeque<(events::Event, Option<EventCompletionAction>)>>,
2331-
logger: &L,
2333+
logger: &L, completion_action: &mut Option<PaymentCompleteUpdate>,
23322334
) where
23332335
L::Target: Logger,
23342336
{
@@ -2479,6 +2481,7 @@ impl OutboundPayments {
24792481
}
24802482
};
24812483
let mut pending_events = pending_events.lock().unwrap();
2484+
// TODO: Handle completion_action
24822485
pending_events.push_back((path_failure, None));
24832486
if let Some(ev) = full_failure_ev {
24842487
pending_events.push_back((ev, None));

0 commit comments

Comments
 (0)