Skip to content

Commit 6d6ed29

Browse files
committed
Prepare to provide new ReleasePaymentComplete monitor updates
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we prepare to generate the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, adding a new `PaymentCompleteUpdate` struct to track the new update before we generate the `ChannelMonitorUpdate` and passing through to the right places in `ChannelManager`. The only cases where we want to generate the new update is after a `PaymentSent` or `PaymentFailed` event when the event was the result of a `MonitorEvent` or the equivalent read during startup. Backport of 8b637cc Conflicts resolved in: * lightning/src/ln/channelmanager.rs * lightning/src/ln/outbound_payment.rs
1 parent 071cb04 commit 6d6ed29

File tree

2 files changed

+153
-44
lines changed

2 files changed

+153
-44
lines changed

lightning/src/ln/channelmanager.rs

Lines changed: 144 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1183,6 +1183,21 @@ impl_writeable_tlv_based_enum_upgradable!(MonitorUpdateCompletionAction,
11831183
},
11841184
);
11851185

1186+
#[derive(Clone, Debug, PartialEq, Eq)]
1187+
pub(crate) struct PaymentCompleteUpdate {
1188+
counterparty_node_id: PublicKey,
1189+
channel_funding_outpoint: OutPoint,
1190+
channel_id: ChannelId,
1191+
htlc_id: SentHTLCId,
1192+
}
1193+
1194+
impl_writeable_tlv_based!(PaymentCompleteUpdate, {
1195+
(1, channel_funding_outpoint, required),
1196+
(3, counterparty_node_id, required),
1197+
(5, channel_id, required),
1198+
(7, htlc_id, required),
1199+
});
1200+
11861201
#[derive(Clone, Debug, PartialEq, Eq)]
11871202
pub(crate) enum EventCompletionAction {
11881203
ReleaseRAAChannelMonitorUpdate {
@@ -3298,7 +3313,7 @@ macro_rules! handle_monitor_update_completion {
32983313
$self.finalize_claims(updates.finalized_claimed_htlcs);
32993314
for failure in updates.failed_htlcs.drain(..) {
33003315
let receiver = HTLCDestination::NextHopChannel { node_id: Some(counterparty_node_id), channel_id };
3301-
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver);
3316+
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver, None);
33023317
}
33033318
} }
33043319
}
@@ -3923,7 +3938,7 @@ where
39233938
for htlc_source in failed_htlcs.drain(..) {
39243939
let reason = HTLCFailReason::from_failure_code(0x4000 | 8);
39253940
let receiver = HTLCDestination::NextHopChannel { node_id: Some(*counterparty_node_id), channel_id: *channel_id };
3926-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
3941+
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver, None);
39273942
}
39283943

39293944
if let Some(shutdown_result) = shutdown_result {
@@ -4046,7 +4061,7 @@ where
40464061
let (source, payment_hash, counterparty_node_id, channel_id) = htlc_source;
40474062
let reason = HTLCFailReason::from_failure_code(0x4000 | 8);
40484063
let receiver = HTLCDestination::NextHopChannel { node_id: Some(counterparty_node_id), channel_id };
4049-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
4064+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
40504065
}
40514066
if let Some((_, funding_txo, _channel_id, monitor_update)) = shutdown_res.monitor_update {
40524067
debug_assert!(false, "This should have been handled in `locked_close_channel`");
@@ -5642,7 +5657,7 @@ where
56425657

56435658
let failure_reason = HTLCFailReason::from_failure_code(0x4000 | 10);
56445659
let destination = HTLCDestination::UnknownNextHop { requested_forward_scid: short_channel_id };
5645-
self.fail_htlc_backwards_internal(&htlc_source, &payment.forward_info.payment_hash, &failure_reason, destination);
5660+
self.fail_htlc_backwards_internal(&htlc_source, &payment.forward_info.payment_hash, &failure_reason, destination, None);
56465661
} else { unreachable!() } // Only `PendingHTLCRouting::Forward`s are intercepted
56475662

56485663
Ok(())
@@ -6324,7 +6339,13 @@ where
63246339
&self.pending_events, &self.logger, |args| self.send_payment_along_path(args));
63256340

63266341
for (htlc_source, payment_hash, failure_reason, destination) in failed_forwards.drain(..) {
6327-
self.fail_htlc_backwards_internal(&htlc_source, &payment_hash, &failure_reason, destination);
6342+
self.fail_htlc_backwards_internal(
6343+
&htlc_source,
6344+
&payment_hash,
6345+
&failure_reason,
6346+
destination,
6347+
None,
6348+
);
63286349
}
63296350
self.forward_htlcs(&mut phantom_receives);
63306351

@@ -6686,7 +6707,7 @@ where
66866707
let source = HTLCSource::PreviousHopData(htlc_source.0.clone());
66876708
let reason = HTLCFailReason::from_failure_code(23);
66886709
let receiver = HTLCDestination::FailedPayment { payment_hash: htlc_source.1 };
6689-
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver);
6710+
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver, None);
66906711
}
66916712

66926713
for (err, counterparty_node_id) in handle_errors.drain(..) {
@@ -6751,7 +6772,7 @@ where
67516772
let reason = self.get_htlc_fail_reason_from_failure_code(failure_code, &htlc);
67526773
let source = HTLCSource::PreviousHopData(htlc.prev_hop);
67536774
let receiver = HTLCDestination::FailedPayment { payment_hash: *payment_hash };
6754-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
6775+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
67556776
}
67566777
}
67576778
}
@@ -6830,18 +6851,26 @@ where
68306851
for (htlc_src, payment_hash) in htlcs_to_fail.drain(..) {
68316852
let reason = HTLCFailReason::reason(failure_code, onion_failure_data.clone());
68326853
let receiver = HTLCDestination::NextHopChannel { node_id: Some(counterparty_node_id.clone()), channel_id };
6833-
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver);
6854+
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver, None);
68346855
}
68356856
}
68366857

6837-
fn fail_htlc_backwards_internal(&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason, destination: HTLCDestination) {
6838-
let push_forward_event = self.fail_htlc_backwards_internal_without_forward_event(source, payment_hash, onion_error, destination);
6858+
fn fail_htlc_backwards_internal(
6859+
&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason,
6860+
destination: HTLCDestination,
6861+
from_monitor_update_completion: Option<PaymentCompleteUpdate>,
6862+
) {
6863+
let push_forward_event = self.fail_htlc_backwards_internal_without_forward_event(source, payment_hash, onion_error, destination, from_monitor_update_completion);
68396864
if push_forward_event { self.push_pending_forwards_ev(); }
68406865
}
68416866

68426867
/// Fails an HTLC backwards to the sender of it to us.
68436868
/// Note that we do not assume that channels corresponding to failed HTLCs are still available.
6844-
fn fail_htlc_backwards_internal_without_forward_event(&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason, destination: HTLCDestination) -> bool {
6869+
fn fail_htlc_backwards_internal_without_forward_event(
6870+
&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason,
6871+
destination: HTLCDestination,
6872+
mut from_monitor_update_completion: Option<PaymentCompleteUpdate>,
6873+
) -> bool {
68456874
// Ensure that no peer state channel storage lock is held when calling this function.
68466875
// This ensures that future code doesn't introduce a lock-order requirement for
68476876
// `forward_htlcs` to be locked after the `per_peer_state` peer locks, which calling
@@ -6862,9 +6891,28 @@ where
68626891
let mut push_forward_event;
68636892
match source {
68646893
HTLCSource::OutboundRoute { ref path, ref session_priv, ref payment_id, .. } => {
6865-
push_forward_event = self.pending_outbound_payments.fail_htlc(source, payment_hash, onion_error, path,
6866-
session_priv, payment_id, self.probing_cookie_secret, &self.secp_ctx,
6867-
&self.pending_events, &self.logger);
6894+
push_forward_event = self.pending_outbound_payments.fail_htlc(
6895+
source,
6896+
payment_hash,
6897+
onion_error,
6898+
path,
6899+
session_priv,
6900+
payment_id,
6901+
self.probing_cookie_secret,
6902+
&self.secp_ctx,
6903+
&self.pending_events,
6904+
&self.logger,
6905+
&mut from_monitor_update_completion,
6906+
);
6907+
if let Some(update) = from_monitor_update_completion {
6908+
// If `fail_htlc` didn't `take` the post-event action, we should go ahead and
6909+
// complete it here as the failure was duplicative - we've already handled it.
6910+
// This should mostly only happen on startup, but it is possible to hit it in
6911+
// rare cases where a MonitorUpdate is replayed after restart because a
6912+
// ChannelMonitor wasn't persisted after it was applied (even though the
6913+
// ChannelManager was).
6914+
// TODO
6915+
}
68686916
},
68696917
HTLCSource::PreviousHopData(HTLCPreviousHopData {
68706918
ref short_channel_id, ref htlc_id, ref incoming_packet_shared_secret,
@@ -6979,7 +7027,7 @@ where
69797027
let reason = self.get_htlc_fail_reason_from_failure_code(FailureCode::InvalidOnionPayload(None), &htlc);
69807028
let source = HTLCSource::PreviousHopData(htlc.prev_hop);
69817029
let receiver = HTLCDestination::FailedPayment { payment_hash };
6982-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
7030+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
69837031
}
69847032
return;
69857033
}
@@ -7085,7 +7133,7 @@ where
70857133
let source = HTLCSource::PreviousHopData(htlc.prev_hop);
70867134
let reason = HTLCFailReason::reason(0x4000 | 15, htlc_msat_height_data);
70877135
let receiver = HTLCDestination::FailedPayment { payment_hash };
7088-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
7136+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
70897137
}
70907138
self.claimable_payments.lock().unwrap().pending_claiming_payments.remove(&payment_hash);
70917139
}
@@ -7356,14 +7404,34 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
73567404
if let Some(pubkey) = next_channel_counterparty_node_id {
73577405
debug_assert_eq!(pubkey, path.hops[0].pubkey);
73587406
}
7359-
let ev_completion_action = EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
7360-
channel_funding_outpoint: Some(next_channel_outpoint),
7361-
channel_id: next_channel_id,
7362-
counterparty_node_id: path.hops[0].pubkey,
7407+
let mut ev_completion_action =
7408+
Some(EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
7409+
channel_funding_outpoint: Some(next_channel_outpoint),
7410+
channel_id: next_channel_id,
7411+
counterparty_node_id: path.hops[0].pubkey,
7412+
});
7413+
self.pending_outbound_payments.claim_htlc(
7414+
payment_id,
7415+
payment_preimage,
7416+
session_priv,
7417+
path,
7418+
from_onchain,
7419+
&mut ev_completion_action,
7420+
&self.pending_events,
7421+
&self.logger,
7422+
);
7423+
// If an event was generated, `claim_htlc` set `ev_completion_action` to None, if
7424+
// not, we should go ahead and run it now (as the claim was duplicative), at least
7425+
// if a PaymentClaimed event with the same action isn't already pending.
7426+
let have_action = if ev_completion_action.is_some() {
7427+
let pending_events = self.pending_events.lock().unwrap();
7428+
pending_events.iter().any(|(_, act)| *act == ev_completion_action)
7429+
} else {
7430+
false
73637431
};
7364-
self.pending_outbound_payments.claim_htlc(payment_id, payment_preimage,
7365-
session_priv, path, from_onchain, ev_completion_action, &self.pending_events,
7366-
&self.logger);
7432+
if !have_action {
7433+
self.handle_post_event_actions(ev_completion_action);
7434+
}
73677435
},
73687436
HTLCSource::PreviousHopData(hop_data) => {
73697437
let prev_channel_id = hop_data.channel_id;
@@ -8709,7 +8777,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
87098777
for htlc_source in dropped_htlcs.drain(..) {
87108778
let receiver = HTLCDestination::NextHopChannel { node_id: Some(counterparty_node_id.clone()), channel_id: msg.channel_id };
87118779
let reason = HTLCFailReason::from_failure_code(0x4000 | 8);
8712-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
8780+
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver, None);
87138781
}
87148782
if let Some(shutdown_res) = finish_shutdown {
87158783
self.finish_close_channel(shutdown_res);
@@ -9120,7 +9188,13 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
91209188
}
91219189

91229190
for (htlc_source, payment_hash, failure_reason, destination) in failed_intercept_forwards.drain(..) {
9123-
push_forward_event |= self.fail_htlc_backwards_internal_without_forward_event(&htlc_source, &payment_hash, &failure_reason, destination);
9191+
push_forward_event |= self.fail_htlc_backwards_internal_without_forward_event(
9192+
&htlc_source,
9193+
&payment_hash,
9194+
&failure_reason,
9195+
destination,
9196+
None,
9197+
);
91249198
}
91259199

91269200
if !new_intercept_events.is_empty() {
@@ -9461,7 +9535,13 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
94619535
log_trace!(logger, "Failing HTLC with hash {} from our monitor", &htlc_update.payment_hash);
94629536
let receiver = HTLCDestination::NextHopChannel { node_id: counterparty_node_id, channel_id };
94639537
let reason = HTLCFailReason::from_failure_code(0x4000 | 8);
9464-
self.fail_htlc_backwards_internal(&htlc_update.source, &htlc_update.payment_hash, &reason, receiver);
9538+
self.fail_htlc_backwards_internal(
9539+
&htlc_update.source,
9540+
&htlc_update.payment_hash,
9541+
&reason,
9542+
receiver,
9543+
None,
9544+
);
94659545
}
94669546
},
94679547
MonitorEvent::HolderForceClosed(_) | MonitorEvent::HolderForceClosedWithInfo { .. } => {
@@ -10806,8 +10886,8 @@ where
1080610886
}
1080710887
}
1080810888

10809-
fn handle_post_event_actions(&self, actions: Vec<EventCompletionAction>) {
10810-
for action in actions {
10889+
fn handle_post_event_actions<I: IntoIterator<Item = EventCompletionAction>>(&self, actions: I) {
10890+
for action in actions.into_iter() {
1081110891
match action {
1081210892
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1081310893
channel_funding_outpoint: _,
@@ -11313,7 +11393,7 @@ where
1131311393
}
1131411394

1131511395
for (source, payment_hash, reason, destination) in timed_out_htlcs.drain(..) {
11316-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination);
11396+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination, None);
1131711397
}
1131811398
}
1131911399

@@ -13378,7 +13458,7 @@ where
1337813458
log_error!(logger, " The ChannelMonitor for channel {} is at counterparty commitment transaction number {} but the ChannelManager is at counterparty commitment transaction number {}.",
1337913459
&channel.context.channel_id(), monitor.get_cur_counterparty_commitment_number(), channel.get_cur_counterparty_commitment_transaction_number());
1338013460
}
13381-
let mut shutdown_result = channel.context.force_shutdown(true, ClosureReason::OutdatedChannelManager);
13461+
let shutdown_result = channel.context.force_shutdown(true, ClosureReason::OutdatedChannelManager);
1338213462
if shutdown_result.unbroadcasted_batch_funding_txid.is_some() {
1338313463
return Err(DecodeError::InvalidValue);
1338413464
}
@@ -13399,7 +13479,9 @@ where
1339913479
counterparty_node_id, funding_txo, channel_id, update
1340013480
});
1340113481
}
13402-
failed_htlcs.append(&mut shutdown_result.dropped_outbound_htlcs);
13482+
for (source, hash, cp_id, chan_id) in shutdown_result.dropped_outbound_htlcs {
13483+
failed_htlcs.push((source, hash, cp_id, chan_id, None));
13484+
}
1340313485
channel_closures.push_back((events::Event::ChannelClosed {
1340413486
channel_id: channel.context.channel_id(),
1340513487
user_channel_id: channel.context.get_user_id(),
@@ -13426,7 +13508,13 @@ where
1342613508
log_info!(logger,
1342713509
"Failing HTLC with hash {} as it is missing in the ChannelMonitor for channel {} but was present in the (stale) ChannelManager",
1342813510
&channel.context.channel_id(), &payment_hash);
13429-
failed_htlcs.push((channel_htlc_source.clone(), *payment_hash, channel.context.get_counterparty_node_id(), channel.context.channel_id()));
13511+
failed_htlcs.push((
13512+
channel_htlc_source.clone(),
13513+
*payment_hash,
13514+
channel.context.get_counterparty_node_id(),
13515+
channel.context.channel_id(),
13516+
None,
13517+
));
1343013518
}
1343113519
}
1343213520
} else {
@@ -13955,14 +14043,23 @@ where
1395514043
// generating a `PaymentPathSuccessful` event but regenerating
1395614044
// it and the `PaymentSent` on every restart until the
1395714045
// `ChannelMonitor` is removed.
13958-
let compl_action =
14046+
let mut compl_action = Some(
1395914047
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1396014048
channel_funding_outpoint: Some(monitor.get_funding_txo().0),
1396114049
channel_id: monitor.channel_id(),
1396214050
counterparty_node_id: path.hops[0].pubkey,
13963-
};
13964-
pending_outbounds.claim_htlc(payment_id, preimage, session_priv,
13965-
path, false, compl_action, &pending_events, &&logger);
14051+
},
14052+
);
14053+
pending_outbounds.claim_htlc(
14054+
payment_id,
14055+
preimage,
14056+
session_priv,
14057+
path,
14058+
false,
14059+
&mut compl_action,
14060+
&pending_events,
14061+
&&logger,
14062+
);
1396614063
pending_events_read = pending_events.into_inner().unwrap();
1396714064
}
1396814065
},
@@ -13975,11 +14072,18 @@ where
1397514072
"Failing HTLC with payment hash {} as it was resolved on-chain.",
1397614073
payment_hash
1397714074
);
14075+
let completion_action = Some(PaymentCompleteUpdate {
14076+
counterparty_node_id: node_id,
14077+
channel_funding_outpoint: monitor.get_funding_txo().0,
14078+
channel_id: monitor.channel_id(),
14079+
htlc_id: SentHTLCId::from_source(&htlc_source),
14080+
});
1397814081
failed_htlcs.push((
1397914082
htlc_source,
1398014083
payment_hash,
1398114084
node_id,
1398214085
monitor.channel_id(),
14086+
completion_action,
1398314087
));
1398414088
} else {
1398514089
log_warn!(
@@ -14574,12 +14678,13 @@ where
1457414678
}
1457514679
}
1457614680

14577-
for htlc_source in failed_htlcs.drain(..) {
14578-
let (source, payment_hash, counterparty_id, channel_id) = htlc_source;
14681+
for htlc_source in failed_htlcs {
14682+
let (source, hash, counterparty_id, channel_id, ev_action) = htlc_source;
1457914683
let receiver =
1458014684
HTLCDestination::NextHopChannel { node_id: Some(counterparty_id), channel_id };
1458114685
let reason = HTLCFailReason::from_failure_code(0x4000 | 8);
14582-
channel_manager.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
14686+
channel_manager
14687+
.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, ev_action);
1458314688
}
1458414689

1458514690
for (source, preimage, downstream_value, downstream_closed, downstream_node_id, downstream_funding, downstream_channel_id) in pending_claims_to_replay {

0 commit comments

Comments
 (0)