Skip to content

Commit 500fc91

Browse files
committed
Stop re-hydrating pending payments once they are fully resolved
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we, finally, begin actually using the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, skipping re-hydration of pending payments once they have been fully resolved through to a user `Event`.
1 parent 8383d2b commit 500fc91

File tree

3 files changed

+51
-127
lines changed

3 files changed

+51
-127
lines changed

lightning/src/chain/channelmonitor.rs

Lines changed: 9 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -2929,10 +2929,6 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
29292929
/// Gets the set of outbound HTLCs which can be (or have been) resolved by this
29302930
/// `ChannelMonitor`. This is used to determine if an HTLC was removed from the channel prior
29312931
/// to the `ChannelManager` having been persisted.
2932-
///
2933-
/// This is similar to [`Self::get_pending_or_resolved_outbound_htlcs`] except it includes
2934-
/// HTLCs which were resolved on-chain (i.e. where the final HTLC resolution was done by an
2935-
/// event from this `ChannelMonitor`).
29362932
pub(crate) fn get_all_current_outbound_htlcs(
29372933
&self,
29382934
) -> HashMap<HTLCSource, (HTLCOutputInCommitment, Option<PaymentPreimage>)> {
@@ -2945,8 +2941,11 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
29452941
for &(ref htlc, ref source_option) in latest_outpoints.iter() {
29462942
if let &Some(ref source) = source_option {
29472943
let htlc_id = SentHTLCId::from_source(source);
2948-
let preimage_opt = us.counterparty_fulfilled_htlcs.get(&htlc_id).cloned();
2949-
res.insert((**source).clone(), (htlc.clone(), preimage_opt));
2944+
if !us.htlcs_resolved_to_user.contains(&htlc_id) {
2945+
let preimage_opt =
2946+
us.counterparty_fulfilled_htlcs.get(&htlc_id).cloned();
2947+
res.insert((**source).clone(), (htlc.clone(), preimage_opt));
2948+
}
29502949
}
29512950
}
29522951
}
@@ -2994,7 +2993,10 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
29942993
if let Some(state) = us.htlcs_resolved_on_chain.iter().filter(filter).next() {
29952994
if let Some(source) = source {
29962995
if state.payment_preimage.is_none() {
2997-
res.insert(source.clone(), htlc.clone());
2996+
let htlc_id = SentHTLCId::from_source(source);
2997+
if !us.htlcs_resolved_to_user.contains(&htlc_id) {
2998+
res.insert(source.clone(), htlc.clone());
2999+
}
29983000
}
29993001
}
30003002
}
@@ -3029,94 +3031,6 @@ impl<Signer: EcdsaChannelSigner> ChannelMonitor<Signer> {
30293031
res
30303032
}
30313033

3032-
/// Gets the set of outbound HTLCs which are pending resolution in this channel or which were
3033-
/// resolved with a preimage from our counterparty.
3034-
///
3035-
/// This is used to reconstruct pending outbound payments on restart in the ChannelManager.
3036-
///
3037-
/// Currently, the preimage is unused, however if it is present in the relevant internal state
3038-
/// an HTLC is always included even if it has been resolved.
3039-
#[rustfmt::skip]
3040-
pub(crate) fn get_pending_or_resolved_outbound_htlcs(&self) -> HashMap<HTLCSource, (HTLCOutputInCommitment, Option<PaymentPreimage>)> {
3041-
let us = self.inner.lock().unwrap();
3042-
// We're only concerned with the confirmation count of HTLC transactions, and don't
3043-
// actually care how many confirmations a commitment transaction may or may not have. Thus,
3044-
// we look for either a FundingSpendConfirmation event or a funding_spend_confirmed.
3045-
let confirmed_txid = us.funding_spend_confirmed.or_else(|| {
3046-
us.onchain_events_awaiting_threshold_conf.iter().find_map(|event| {
3047-
if let OnchainEvent::FundingSpendConfirmation { .. } = event.event {
3048-
Some(event.txid)
3049-
} else { None }
3050-
})
3051-
});
3052-
3053-
if confirmed_txid.is_none() {
3054-
// If we have not seen a commitment transaction on-chain (ie the channel is not yet
3055-
// closed), just get the full set.
3056-
mem::drop(us);
3057-
return self.get_all_current_outbound_htlcs();
3058-
}
3059-
3060-
let mut res = new_hash_map();
3061-
macro_rules! walk_htlcs {
3062-
($holder_commitment: expr, $htlc_iter: expr) => {
3063-
for (htlc, source) in $htlc_iter {
3064-
if us.htlcs_resolved_on_chain.iter().any(|v| v.commitment_tx_output_idx == htlc.transaction_output_index) {
3065-
// We should assert that funding_spend_confirmed is_some() here, but we
3066-
// have some unit tests which violate HTLC transaction CSVs entirely and
3067-
// would fail.
3068-
// TODO: Once tests all connect transactions at consensus-valid times, we
3069-
// should assert here like we do in `get_claimable_balances`.
3070-
} else if htlc.offered == $holder_commitment {
3071-
// If the payment was outbound, check if there's an HTLCUpdate
3072-
// indicating we have spent this HTLC with a timeout, claiming it back
3073-
// and awaiting confirmations on it.
3074-
let htlc_update_confd = us.onchain_events_awaiting_threshold_conf.iter().any(|event| {
3075-
if let OnchainEvent::HTLCUpdate { commitment_tx_output_idx: Some(commitment_tx_output_idx), .. } = event.event {
3076-
// If the HTLC was timed out, we wait for ANTI_REORG_DELAY blocks
3077-
// before considering it "no longer pending" - this matches when we
3078-
// provide the ChannelManager an HTLC failure event.
3079-
Some(commitment_tx_output_idx) == htlc.transaction_output_index &&
3080-
us.best_block.height >= event.height + ANTI_REORG_DELAY - 1
3081-
} else if let OnchainEvent::HTLCSpendConfirmation { commitment_tx_output_idx, .. } = event.event {
3082-
// If the HTLC was fulfilled with a preimage, we consider the HTLC
3083-
// immediately non-pending, matching when we provide ChannelManager
3084-
// the preimage.
3085-
Some(commitment_tx_output_idx) == htlc.transaction_output_index
3086-
} else { false }
3087-
});
3088-
if let Some(source) = source {
3089-
let counterparty_resolved_preimage_opt =
3090-
us.counterparty_fulfilled_htlcs.get(&SentHTLCId::from_source(source)).cloned();
3091-
if !htlc_update_confd || counterparty_resolved_preimage_opt.is_some() {
3092-
res.insert(source.clone(), (htlc.clone(), counterparty_resolved_preimage_opt));
3093-
}
3094-
} else {
3095-
panic!("Outbound HTLCs should have a source");
3096-
}
3097-
}
3098-
}
3099-
}
3100-
}
3101-
3102-
let txid = confirmed_txid.unwrap();
3103-
if Some(txid) == us.funding.current_counterparty_commitment_txid || Some(txid) == us.funding.prev_counterparty_commitment_txid {
3104-
walk_htlcs!(false, us.funding.counterparty_claimable_outpoints.get(&txid).unwrap().iter().filter_map(|(a, b)| {
3105-
if let &Some(ref source) = b {
3106-
Some((a, Some(&**source)))
3107-
} else { None }
3108-
}));
3109-
} else if txid == us.funding.current_holder_commitment_tx.trust().txid() {
3110-
walk_htlcs!(true, holder_commitment_htlcs!(us, CURRENT_WITH_SOURCES));
3111-
} else if let Some(prev_commitment_tx) = &us.funding.prev_holder_commitment_tx {
3112-
if txid == prev_commitment_tx.trust().txid() {
3113-
walk_htlcs!(true, holder_commitment_htlcs!(us, PREV_WITH_SOURCES).unwrap());
3114-
}
3115-
}
3116-
3117-
res
3118-
}
3119-
31203034
pub(crate) fn get_stored_preimages(
31213035
&self,
31223036
) -> HashMap<PaymentHash, (PaymentPreimage, Vec<PaymentClaimDetails>)> {

lightning/src/ln/channelmanager.rs

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16451,8 +16451,7 @@ where
1645116451
}
1645216452

1645316453
if is_channel_closed {
16454-
for (htlc_source, (htlc, _)) in monitor.get_pending_or_resolved_outbound_htlcs()
16455-
{
16454+
for (htlc_source, (htlc, _)) in monitor.get_all_current_outbound_htlcs() {
1645616455
let logger = WithChannelMonitor::from(
1645716456
&args.logger,
1645816457
monitor,

lightning/src/ln/payment_tests.rs

Lines changed: 41 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1182,9 +1182,6 @@ fn do_test_completed_payment_not_retryable_on_reload(use_dust: bool) {
11821182
nodes[1].node.peer_disconnected(node_a_id);
11831183

11841184
nodes[0].node.test_process_background_events();
1185-
check_added_monitors(&nodes[0], 1); // TODO: Removed in the next commit as this only required
1186-
// when we are still seeing all payments, even resolved
1187-
// ones.
11881185

11891186
let mut reconnect_args = ReconnectArgs::new(&nodes[0], &nodes[1]);
11901187
reconnect_args.send_channel_ready = (true, true);
@@ -1217,9 +1214,6 @@ fn do_test_completed_payment_not_retryable_on_reload(use_dust: bool) {
12171214
nodes[1].node.peer_disconnected(node_a_id);
12181215

12191216
nodes[0].node.test_process_background_events();
1220-
check_added_monitors(&nodes[0], 1); // TODO: Removed in the next commit as this only required
1221-
// when we are still seeing all payments, even resolved
1222-
// ones.
12231217

12241218
reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[1]));
12251219

@@ -1238,7 +1232,8 @@ fn test_completed_payment_not_retryable_on_reload() {
12381232
}
12391233

12401234
fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
1241-
persist_manager_post_event: bool, confirm_commitment_tx: bool, payment_timeout: bool,
1235+
persist_manager_post_event: bool, persist_monitor_after_events: bool,
1236+
confirm_commitment_tx: bool, payment_timeout: bool,
12421237
) {
12431238
// When a Channel is closed, any outbound HTLCs which were relayed through it are simply
12441239
// dropped. From there, the ChannelManager relies on the ChannelMonitor having a copy of the
@@ -1338,36 +1333,58 @@ fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
13381333
node_a_ser = nodes[0].node.encode();
13391334
}
13401335

1341-
let mon_ser = get_monitor!(nodes[0], chan_id).encode();
1336+
let mut mon_ser = Vec::new();
1337+
if !persist_monitor_after_events {
1338+
mon_ser = get_monitor!(nodes[0], chan_id).encode();
1339+
}
13421340
if payment_timeout {
13431341
let conditions = PaymentFailedConditions::new().from_mon_update();
13441342
expect_payment_failed_conditions(&nodes[0], payment_hash, false, conditions);
13451343
} else {
13461344
expect_payment_sent(&nodes[0], payment_preimage, None, true, true);
13471345
}
1346+
// Note that if we persist the monitor before processing the events, above, we'll always get
1347+
// them replayed on restart no matter what
1348+
if persist_monitor_after_events {
1349+
mon_ser = get_monitor!(nodes[0], chan_id).encode();
1350+
}
13481351

13491352
// If we persist the ChannelManager after we get the PaymentSent event, we shouldn't get it
13501353
// twice.
13511354
if persist_manager_post_event {
13521355
node_a_ser = nodes[0].node.encode();
1356+
} else if persist_monitor_after_events {
1357+
// Persisting the monitor after the events (resulting in a new monitor being persisted) but
1358+
// didn't persist the manager will result in an FC, which we don't test here.
1359+
panic!();
13531360
}
13541361

13551362
// Now reload nodes[0]...
13561363
reload_node!(nodes[0], &node_a_ser, &[&mon_ser], persister, chain_monitor, node_a_reload);
13571364

13581365
check_added_monitors(&nodes[0], 0);
1359-
if persist_manager_post_event {
1366+
if persist_manager_post_event && persist_monitor_after_events {
13601367
assert!(nodes[0].node.get_and_clear_pending_events().is_empty());
1361-
check_added_monitors(&nodes[0], 2);
1368+
check_added_monitors(&nodes[0], 0);
13621369
} else if payment_timeout {
1363-
let conditions = PaymentFailedConditions::new().from_mon_update();
1370+
let mut conditions = PaymentFailedConditions::new();
1371+
if !persist_monitor_after_events {
1372+
conditions = conditions.from_mon_update();
1373+
}
13641374
expect_payment_failed_conditions(&nodes[0], payment_hash, false, conditions);
1375+
check_added_monitors(&nodes[0], 0);
13651376
} else {
1377+
if persist_manager_post_event {
1378+
assert!(nodes[0].node.get_and_clear_pending_events().is_empty());
1379+
} else {
1380+
expect_payment_sent(&nodes[0], payment_preimage, None, true, false);
1381+
}
13661382
// After reload, the ChannelManager identified the failed payment and queued up the
1367-
// PaymentSent and corresponding ChannelMonitorUpdate to mark the payment handled, but
1368-
// while processing the pending `MonitorEvent`s (which were not processed before the
1369-
// monitor was persisted) we will end up with a duplicate ChannelMonitorUpdate.
1370-
expect_payment_sent(&nodes[0], payment_preimage, None, true, false);
1383+
// PaymentSent (or not, if `persist_manager_post_event` resulted in us detecting we
1384+
// already did that) and corresponding ChannelMonitorUpdate to mark the payment
1385+
// handled, but while processing the pending `MonitorEvent`s (which were not processed
1386+
// before the monitor was persisted) we will end up with a duplicate
1387+
// ChannelMonitorUpdate.
13711388
check_added_monitors(&nodes[0], 2);
13721389
}
13731390

@@ -1381,12 +1398,15 @@ fn do_test_dup_htlc_onchain_doesnt_fail_on_reload(
13811398

13821399
#[test]
13831400
fn test_dup_htlc_onchain_doesnt_fail_on_reload() {
1384-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true);
1385-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, false);
1386-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, false);
1387-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, true, true);
1388-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, true, false);
1389-
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, false);
1401+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true, true);
1402+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, true, false);
1403+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, true, false, false);
1404+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, true, true);
1405+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, true, false);
1406+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(true, false, false, false);
1407+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, true, true);
1408+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, true, false);
1409+
do_test_dup_htlc_onchain_doesnt_fail_on_reload(false, false, false, false);
13901410
}
13911411

13921412
#[test]
@@ -4140,15 +4160,9 @@ fn do_no_missing_sent_on_reload(persist_manager_with_payment: bool, at_midpoint:
41404160
let config = test_default_channel_config();
41414161
reload_node!(nodes[0], config, &node_ser, &[&mon_ser], persist_b, chain_monitor_b, node_a_2);
41424162

4143-
// Because the pending payment will currently stick around forever, we'll apply a
4144-
// ChannelMonitorUpdate on each startup to attempt to remove it.
4145-
// TODO: This will be dropped in the next commit after we actually remove the payment!
4146-
check_added_monitors(&nodes[0], 0);
41474163
nodes[0].node.test_process_background_events();
4148-
check_added_monitors(&nodes[0], 1);
41494164
let events = nodes[0].node.get_and_clear_pending_events();
41504165
assert!(events.is_empty());
4151-
check_added_monitors(&nodes[0], 0);
41524166

41534167
// Ensure that we don't generate any further events even after the channel-closing commitment
41544168
// transaction is confirmed on-chain.
@@ -4167,9 +4181,6 @@ fn do_no_missing_sent_on_reload(persist_manager_with_payment: bool, at_midpoint:
41674181
reload_node!(nodes[0], config, &node_ser, &[&mon_ser], persist_c, chain_monitor_c, node_a_3);
41684182
let events = nodes[0].node.get_and_clear_pending_events();
41694183
assert!(events.is_empty());
4170-
4171-
// TODO: This will be dropped in the next commit after we actually remove the payment!
4172-
check_added_monitors(&nodes[0], 1);
41734184
}
41744185

41754186
#[test]

0 commit comments

Comments
 (0)