Skip to content

Commit 71a364c

Browse files
committed
Generate new ReleasePaymentComplete monitor updates
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we begin generating the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, updating functional tests for the new `ChannelMonitorUpdate`s where required.
1 parent 8b637cc commit 71a364c

9 files changed

+282
-55
lines changed

lightning/src/chain/channelmonitor.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4119,6 +4119,7 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitorImpl<Signer> {
41194119
if updates.update_id == LEGACY_CLOSED_CHANNEL_UPDATE_ID || self.lockdown_from_offchain {
41204120
assert_eq!(updates.updates.len(), 1);
41214121
match updates.updates[0] {
4122+
ChannelMonitorUpdateStep::ReleasePaymentComplete { .. } => {},
41224123
ChannelMonitorUpdateStep::ChannelForceClosed { .. } => {},
41234124
// We should have already seen a `ChannelForceClosed` update if we're trying to
41244125
// provide a preimage at this point.

lightning/src/ln/chanmon_update_fail_tests.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3870,6 +3870,7 @@ fn do_test_durable_preimages_on_closed_channel(
38703870
};
38713871
nodes[0].node.force_close_broadcasting_latest_txn(&chan_id_ab, &node_b_id, err_msg).unwrap();
38723872
check_closed_event(&nodes[0], 1, reason, false, &[node_b_id], 100000);
3873+
check_added_monitors(&nodes[0], 1);
38733874
let as_closing_tx = nodes[0].tx_broadcaster.txn_broadcasted.lock().unwrap().split_off(0);
38743875
assert_eq!(as_closing_tx.len(), 1);
38753876

lightning/src/ln/channelmanager.rs

Lines changed: 145 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1267,6 +1267,12 @@ pub(crate) enum EventCompletionAction {
12671267
channel_funding_outpoint: Option<OutPoint>,
12681268
channel_id: ChannelId,
12691269
},
1270+
1271+
/// When a payment's resolution is communicated to the downstream logic via
1272+
/// [`Event::PaymentSent`] or [`Event::PaymentFailed`] we may want to mark the payment as
1273+
/// fully-resolved in the [`ChannelMonitor`], which we do via this action.
1274+
/// Note that this action will be dropped on downgrade to LDK prior to 0.2!
1275+
ReleasePaymentCompleteChannelMonitorUpdate(PaymentCompleteUpdate),
12701276
}
12711277
impl_writeable_tlv_based_enum!(EventCompletionAction,
12721278
(0, ReleaseRAAChannelMonitorUpdate) => {
@@ -1279,6 +1285,7 @@ impl_writeable_tlv_based_enum!(EventCompletionAction,
12791285
ChannelId::v1_from_funding_outpoint(channel_funding_outpoint.unwrap())
12801286
})),
12811287
}
1288+
{1, ReleasePaymentCompleteChannelMonitorUpdate} => (),
12821289
);
12831290

12841291
/// The source argument which is passed to [`ChannelManager::claim_mpp_part`].
@@ -8009,11 +8016,20 @@ where
80098016
if let Some(update) = from_monitor_update_completion {
80108017
// If `fail_htlc` didn't `take` the post-event action, we should go ahead and
80118018
// complete it here as the failure was duplicative - we've already handled it.
8012-
// This should mostly only happen on startup, but it is possible to hit it in
8013-
// rare cases where a MonitorUpdate is replayed after restart because a
8014-
// ChannelMonitor wasn't persisted after it was applied (even though the
8015-
// ChannelManager was).
8016-
// TODO
8019+
// This can happen in rare cases where a MonitorUpdate is replayed after
8020+
// restart because a ChannelMonitor wasn't persisted after it was applied (even
8021+
// though the ChannelManager was).
8022+
// For such cases, we also check that there's no existing pending event to
8023+
// complete this action already, which we let finish instead.
8024+
let action =
8025+
EventCompletionAction::ReleasePaymentCompleteChannelMonitorUpdate(update);
8026+
let have_action = {
8027+
let pending_events = self.pending_events.lock().unwrap();
8028+
pending_events.iter().any(|(_, act)| act.as_ref() == Some(&action))
8029+
};
8030+
if !have_action {
8031+
self.handle_post_event_actions([action]);
8032+
}
80178033
}
80188034
},
80198035
HTLCSource::PreviousHopData(HTLCPreviousHopData {
@@ -8642,19 +8658,34 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
86428658
next_user_channel_id: Option<u128>, attribution_data: Option<AttributionData>,
86438659
send_timestamp: Option<Duration>,
86448660
) {
8661+
debug_assert_eq!(
8662+
startup_replay,
8663+
!self.background_events_processed_since_startup.load(Ordering::Acquire)
8664+
);
8665+
let htlc_id = SentHTLCId::from_source(&source);
86458666
match source {
86468667
HTLCSource::OutboundRoute {
86478668
session_priv, payment_id, path, bolt12_invoice, ..
86488669
} => {
8649-
debug_assert!(self.background_events_processed_since_startup.load(Ordering::Acquire),
8670+
debug_assert!(!startup_replay,
86508671
"We don't support claim_htlc claims during startup - monitors may not be available yet");
86518672
debug_assert_eq!(next_channel_counterparty_node_id, path.hops[0].pubkey);
8652-
let mut ev_completion_action =
8673+
8674+
let mut ev_completion_action = if from_onchain {
8675+
let release = PaymentCompleteUpdate {
8676+
counterparty_node_id: next_channel_counterparty_node_id,
8677+
channel_funding_outpoint: next_channel_outpoint,
8678+
channel_id: next_channel_id,
8679+
htlc_id,
8680+
};
8681+
Some(EventCompletionAction::ReleasePaymentCompleteChannelMonitorUpdate(release))
8682+
} else {
86538683
Some(EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
86548684
channel_funding_outpoint: Some(next_channel_outpoint),
86558685
channel_id: next_channel_id,
86568686
counterparty_node_id: path.hops[0].pubkey,
8657-
});
8687+
})
8688+
};
86588689
self.pending_outbound_payments.claim_htlc(
86598690
payment_id,
86608691
payment_preimage,
@@ -11372,12 +11403,18 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1137211403
channel_id,
1137311404
};
1137411405
let reason = HTLCFailReason::from_failure_code(failure_reason);
11406+
let completion_update = Some(PaymentCompleteUpdate {
11407+
counterparty_node_id,
11408+
channel_funding_outpoint: funding_outpoint,
11409+
channel_id,
11410+
htlc_id: SentHTLCId::from_source(&htlc_update.source),
11411+
});
1137511412
self.fail_htlc_backwards_internal(
1137611413
&htlc_update.source,
1137711414
&htlc_update.payment_hash,
1137811415
&reason,
1137911416
receiver,
11380-
None,
11417+
completion_update,
1138111418
);
1138211419
}
1138311420
},
@@ -12864,8 +12901,62 @@ where
1286412901
channel_id,
1286512902
counterparty_node_id,
1286612903
} => {
12904+
let startup_complete =
12905+
self.background_events_processed_since_startup.load(Ordering::Acquire);
12906+
debug_assert!(startup_complete);
1286712907
self.handle_monitor_update_release(counterparty_node_id, channel_id, None);
1286812908
},
12909+
EventCompletionAction::ReleasePaymentCompleteChannelMonitorUpdate(
12910+
PaymentCompleteUpdate {
12911+
counterparty_node_id,
12912+
channel_funding_outpoint,
12913+
channel_id,
12914+
htlc_id,
12915+
},
12916+
) => {
12917+
let per_peer_state = self.per_peer_state.read().unwrap();
12918+
let mut peer_state = per_peer_state
12919+
.get(&counterparty_node_id)
12920+
.map(|state| state.lock().unwrap())
12921+
.expect("Channels originating a payment resolution must have peer state");
12922+
let update_id = peer_state
12923+
.closed_channel_monitor_update_ids
12924+
.get_mut(&channel_id)
12925+
.expect("Channels originating a payment resolution must have a monitor");
12926+
*update_id += 1;
12927+
12928+
let update = ChannelMonitorUpdate {
12929+
update_id: *update_id,
12930+
channel_id: Some(channel_id),
12931+
updates: vec![ChannelMonitorUpdateStep::ReleasePaymentComplete {
12932+
htlc: htlc_id,
12933+
}],
12934+
};
12935+
12936+
let during_startup =
12937+
!self.background_events_processed_since_startup.load(Ordering::Acquire);
12938+
if during_startup {
12939+
let event = BackgroundEvent::MonitorUpdateRegeneratedOnStartup {
12940+
counterparty_node_id,
12941+
funding_txo: channel_funding_outpoint,
12942+
channel_id,
12943+
update,
12944+
};
12945+
self.pending_background_events.lock().unwrap().push(event);
12946+
} else {
12947+
handle_new_monitor_update!(
12948+
self,
12949+
channel_funding_outpoint,
12950+
update,
12951+
peer_state,
12952+
peer_state,
12953+
per_peer_state,
12954+
counterparty_node_id,
12955+
channel_id,
12956+
POST_CHANNEL_CLOSE
12957+
);
12958+
}
12959+
},
1286912960
}
1287012961
}
1287112962
}
@@ -16557,6 +16648,7 @@ where
1655716648
monitor,
1655816649
Some(htlc.payment_hash),
1655916650
);
16651+
let htlc_id = SentHTLCId::from_source(&htlc_source);
1656016652
match htlc_source {
1656116653
HTLCSource::PreviousHopData(prev_hop_data) => {
1656216654
let pending_forward_matches_htlc = |info: &PendingAddHTLCInfo| {
@@ -16614,6 +16706,15 @@ where
1661416706
} => {
1661516707
if let Some(preimage) = preimage_opt {
1661616708
let pending_events = Mutex::new(pending_events_read);
16709+
let update = PaymentCompleteUpdate {
16710+
counterparty_node_id: monitor.get_counterparty_node_id(),
16711+
channel_funding_outpoint: monitor.get_funding_txo(),
16712+
channel_id: monitor.channel_id(),
16713+
htlc_id,
16714+
};
16715+
let mut compl_action = Some(
16716+
EventCompletionAction::ReleasePaymentCompleteChannelMonitorUpdate(update)
16717+
);
1661716718
// Note that we set `from_onchain` to "false" here,
1661816719
// deliberately keeping the pending payment around forever.
1661916720
// Given it should only occur when we have a channel we're
@@ -16622,15 +16723,6 @@ where
1662216723
// generating a `PaymentPathSuccessful` event but regenerating
1662316724
// it and the `PaymentSent` on every restart until the
1662416725
// `ChannelMonitor` is removed.
16625-
let mut compl_action = Some(
16626-
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
16627-
channel_funding_outpoint: Some(
16628-
monitor.get_funding_txo(),
16629-
),
16630-
channel_id: monitor.channel_id(),
16631-
counterparty_node_id: path.hops[0].pubkey,
16632-
},
16633-
);
1663416726
pending_outbounds.claim_htlc(
1663516727
payment_id,
1663616728
preimage,
@@ -16642,6 +16734,41 @@ where
1664216734
&pending_events,
1664316735
&&logger,
1664416736
);
16737+
// If the completion action was not consumed, then there was no
16738+
// payment to claim, and we need to tell the `ChannelMonitor`
16739+
// we don't need to hear about the HTLC again, at least as long
16740+
// as the PaymentSent event isn't still sitting around in our
16741+
// event queue.
16742+
let have_action = if compl_action.is_some() {
16743+
let pending_events = pending_events.lock().unwrap();
16744+
pending_events.iter().any(|(_, act)| *act == compl_action)
16745+
} else {
16746+
false
16747+
};
16748+
if !have_action && compl_action.is_some() {
16749+
let mut peer_state = per_peer_state
16750+
.get(&counterparty_node_id)
16751+
.map(|state| state.lock().unwrap())
16752+
.expect("Channels originating a preimage must have peer state");
16753+
let update_id = peer_state
16754+
.closed_channel_monitor_update_ids
16755+
.get_mut(channel_id)
16756+
.expect("Channels originating a preimage must have a monitor");
16757+
*update_id += 1;
16758+
16759+
pending_background_events.push(BackgroundEvent::MonitorUpdateRegeneratedOnStartup {
16760+
counterparty_node_id: monitor.get_counterparty_node_id(),
16761+
funding_txo: monitor.get_funding_txo(),
16762+
channel_id: monitor.channel_id(),
16763+
update: ChannelMonitorUpdate {
16764+
update_id: *update_id,
16765+
channel_id: Some(monitor.channel_id()),
16766+
updates: vec![ChannelMonitorUpdateStep::ReleasePaymentComplete {
16767+
htlc: htlc_id,
16768+
}],
16769+
},
16770+
});
16771+
}
1664516772
pending_events_read = pending_events.into_inner().unwrap();
1664616773
}
1664716774
},

lightning/src/ln/functional_test_utils.rs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2803,6 +2803,9 @@ pub fn expect_payment_sent<CM: AChannelManager, H: NodeHolder<CM = CM>>(
28032803
expected_fee_msat_opt: Option<Option<u64>>, expect_per_path_claims: bool,
28042804
expect_post_ev_mon_update: bool,
28052805
) -> (Option<PaidBolt12Invoice>, Vec<Event>) {
2806+
if expect_post_ev_mon_update {
2807+
check_added_monitors(node, 0);
2808+
}
28062809
let events = node.node().get_and_clear_pending_events();
28072810
let expected_payment_hash = PaymentHash(
28082811
bitcoin::hashes::sha256::Hash::hash(&expected_payment_preimage.0).to_byte_array(),
@@ -3052,6 +3055,7 @@ pub struct PaymentFailedConditions<'a> {
30523055
pub(crate) expected_blamed_chan_closed: Option<bool>,
30533056
pub(crate) expected_mpp_parts_remain: bool,
30543057
pub(crate) retry_expected: bool,
3058+
pub(crate) from_mon_update: bool,
30553059
}
30563060

30573061
impl<'a> PaymentFailedConditions<'a> {
@@ -3062,6 +3066,7 @@ impl<'a> PaymentFailedConditions<'a> {
30623066
expected_blamed_chan_closed: None,
30633067
expected_mpp_parts_remain: false,
30643068
retry_expected: false,
3069+
from_mon_update: false,
30653070
}
30663071
}
30673072
pub fn mpp_parts_remain(mut self) -> Self {
@@ -3086,6 +3091,10 @@ impl<'a> PaymentFailedConditions<'a> {
30863091
self.retry_expected = true;
30873092
self
30883093
}
3094+
pub fn from_mon_update(mut self) -> Self {
3095+
self.from_mon_update = true;
3096+
self
3097+
}
30893098
}
30903099

30913100
#[cfg(any(test, feature = "_externalize_tests"))]
@@ -3195,7 +3204,13 @@ pub fn expect_payment_failed_conditions<'a, 'b, 'c, 'd, 'e>(
31953204
node: &'a Node<'b, 'c, 'd>, expected_payment_hash: PaymentHash,
31963205
expected_payment_failed_permanently: bool, conditions: PaymentFailedConditions<'e>,
31973206
) {
3207+
if conditions.from_mon_update {
3208+
check_added_monitors(node, 0);
3209+
}
31983210
let events = node.node.get_and_clear_pending_events();
3211+
if conditions.from_mon_update {
3212+
check_added_monitors(node, 1);
3213+
}
31993214
expect_payment_failed_conditions_event(
32003215
events,
32013216
expected_payment_hash,

0 commit comments

Comments
 (0)