Skip to content

Commit 8b637cc

Browse files
committed
Prepare to provide new ReleasePaymentComplete monitor updates
`MonitorEvent`s aren't delivered to the `ChannelManager` in a durable fasion - if the `ChannelManager` fetches the pending `MonitorEvent`s, then the `ChannelMonitor` gets persisted (i.e. due to a block update) then the node crashes, prior to persisting the `ChannelManager` again, the `MonitorEvent` and its effects on the `ChannelManger` will be lost. This isn't likely in a sync persist environment, but in an async one this could be an issue. Note that this is only an issue for closed channels - `MonitorEvent`s only inform the `ChannelManager` that a channel is closed (which the `ChannelManager` will learn on startup or when it next tries to advance the channel state), that `ChannelMonitorUpdate` writes completed (which the `ChannelManager` will detect on startup), or that HTLCs resolved on-chain post closure. Of the three, only the last is problematic to lose prior to a reload. In previous commits we ensured that HTLC resolutions which came to `ChannelManager` via a `MonitorEvent` were replayed on startup if the `MonitorEvent` was lost. However, in cases where the `ChannelManager` was so stale that it didn't have the payment state for an HTLC at all, we only re-add it in cases where `ChannelMonitor::get_pending_or_resolved_outbound_htlcs` includes it. Because constantly re-adding a payment state and then failing it would generate lots of noise for users on startup (not to mention risk of confusing stale payment events for the latest state of a payment when the `PaymentId` has been reused to retry a payment). Thus, `get_pending_or_resolved_outbound_htlcs` does not include state for HTLCs which were resolved on chain with a preimage or HTLCs which were resolved on chain with a timeout after `ANTI_REORG_DELAY` confirmations. This critera matches the critera for generating a `MonitorEvent`, and works great under the assumption that `MonitorEvent`s are reliably delivered. However, if they are not, and our `ChannelManager` is lost or substantially old (or, in a future where we do not persist `ChannelManager` at all), we will not end up seeing payment resolution events for an HTLC. Instead, we really want to tell our `ChannelMonitor`s when the resolution of an HTLC is complete. Note that we don't particularly care about non-payment HTLCs, as there is no re-hydration of state to do there - `ChannelManager` load ignores forwarded HTLCs coming back from `get_pending_or_resolved_outbound_htlcs` as there's nothing to do - we always attempt to replay the success/failure and figure out if it mattered based on whether there was still an HTLC to claim/fail. Here we prepare to generate the new `ChannelMonitorUpdateStep::ReleasePaymentComplete` updates, adding a new `PaymentCompleteUpdate` struct to track the new update before we generate the `ChannelMonitorUpdate` and passing through to the right places in `ChannelManager`. The only cases where we want to generate the new update is after a `PaymentSent` or `PaymentFailed` event when the event was the result of a `MonitorEvent` or the equivalent read during startup.
1 parent c49ce57 commit 8b637cc

File tree

2 files changed

+90
-29
lines changed

2 files changed

+90
-29
lines changed

lightning/src/ln/channelmanager.rs

Lines changed: 82 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,6 +1244,21 @@ impl_writeable_tlv_based_enum_upgradable!(MonitorUpdateCompletionAction,
12441244
},
12451245
);
12461246

1247+
#[derive(Clone, Debug, PartialEq, Eq)]
1248+
pub(crate) struct PaymentCompleteUpdate {
1249+
counterparty_node_id: PublicKey,
1250+
channel_funding_outpoint: OutPoint,
1251+
channel_id: ChannelId,
1252+
htlc_id: SentHTLCId,
1253+
}
1254+
1255+
impl_writeable_tlv_based!(PaymentCompleteUpdate, {
1256+
(1, channel_funding_outpoint, required),
1257+
(3, counterparty_node_id, required),
1258+
(5, channel_id, required),
1259+
(7, htlc_id, required),
1260+
});
1261+
12471262
#[derive(Clone, Debug, PartialEq, Eq)]
12481263
pub(crate) enum EventCompletionAction {
12491264
ReleaseRAAChannelMonitorUpdate {
@@ -3464,7 +3479,7 @@ macro_rules! handle_monitor_update_completion {
34643479
$self.finalize_claims(updates.finalized_claimed_htlcs);
34653480
for failure in updates.failed_htlcs.drain(..) {
34663481
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(counterparty_node_id), channel_id };
3467-
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver);
3482+
$self.fail_htlc_backwards_internal(&failure.0, &failure.1, &failure.2, receiver, None);
34683483
}
34693484
} }
34703485
}
@@ -4137,7 +4152,8 @@ where
41374152
let failure_reason = LocalHTLCFailureReason::ChannelClosed;
41384153
let reason = HTLCFailReason::from_failure_code(failure_reason);
41394154
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(*counterparty_node_id), channel_id: *chan_id };
4140-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
4155+
let (source, hash) = htlc_source;
4156+
self.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, None);
41414157
}
41424158

41434159
let _ = handle_error!(self, shutdown_result, *counterparty_node_id);
@@ -4271,7 +4287,7 @@ where
42714287
let failure_reason = LocalHTLCFailureReason::ChannelClosed;
42724288
let reason = HTLCFailReason::from_failure_code(failure_reason);
42734289
let receiver = HTLCHandlingFailureType::Forward { node_id: Some(counterparty_node_id), channel_id };
4274-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
4290+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
42754291
}
42764292
if let Some((_, funding_txo, _channel_id, monitor_update)) = shutdown_res.monitor_update {
42774293
debug_assert!(false, "This should have been handled in `locked_close_channel`");
@@ -6288,7 +6304,8 @@ where
62886304

62896305
let reason = HTLCFailReason::from_failure_code(LocalHTLCFailureReason::UnknownNextPeer);
62906306
let destination = HTLCHandlingFailureType::InvalidForward { requested_forward_scid: short_channel_id };
6291-
self.fail_htlc_backwards_internal(&htlc_source, &payment.forward_info.payment_hash, &reason, destination);
6307+
let hash = payment.forward_info.payment_hash;
6308+
self.fail_htlc_backwards_internal(&htlc_source, &hash, &reason, destination, None);
62926309
} else { unreachable!() } // Only `PendingHTLCRouting::Forward`s are intercepted
62936310

62946311
Ok(())
@@ -6577,6 +6594,7 @@ where
65776594
&payment_hash,
65786595
&failure_reason,
65796596
destination,
6597+
None,
65806598
);
65816599
}
65826600
self.forward_htlcs(&mut phantom_receives);
@@ -7786,7 +7804,7 @@ where
77867804
let failure_reason = LocalHTLCFailureReason::MPPTimeout;
77877805
let reason = HTLCFailReason::from_failure_code(failure_reason);
77887806
let receiver = HTLCHandlingFailureType::Receive { payment_hash: htlc_source.1 };
7789-
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver);
7807+
self.fail_htlc_backwards_internal(&source, &htlc_source.1, &reason, receiver, None);
77907808
}
77917809

77927810
for (err, counterparty_node_id) in handle_errors {
@@ -7852,7 +7870,7 @@ where
78527870
let reason = self.get_htlc_fail_reason_from_failure_code(failure_code, &htlc);
78537871
let source = HTLCSource::PreviousHopData(htlc.prev_hop);
78547872
let receiver = HTLCHandlingFailureType::Receive { payment_hash: *payment_hash };
7855-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
7873+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
78567874
}
78577875
}
78587876
}
@@ -7945,7 +7963,7 @@ where
79457963
node_id: Some(counterparty_node_id.clone()),
79467964
channel_id,
79477965
};
7948-
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver);
7966+
self.fail_htlc_backwards_internal(&htlc_src, &payment_hash, &reason, receiver, None);
79497967
}
79507968
}
79517969

@@ -7954,6 +7972,7 @@ where
79547972
fn fail_htlc_backwards_internal(
79557973
&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason,
79567974
failure_type: HTLCHandlingFailureType,
7975+
mut from_monitor_update_completion: Option<PaymentCompleteUpdate>,
79577976
) {
79587977
// Ensure that no peer state channel storage lock is held when calling this function.
79597978
// This ensures that future code doesn't introduce a lock-order requirement for
@@ -7985,7 +8004,17 @@ where
79858004
&self.secp_ctx,
79868005
&self.pending_events,
79878006
&self.logger,
8007+
&mut from_monitor_update_completion,
79888008
);
8009+
if let Some(update) = from_monitor_update_completion {
8010+
// If `fail_htlc` didn't `take` the post-event action, we should go ahead and
8011+
// complete it here as the failure was duplicative - we've already handled it.
8012+
// This should mostly only happen on startup, but it is possible to hit it in
8013+
// rare cases where a MonitorUpdate is replayed after restart because a
8014+
// ChannelMonitor wasn't persisted after it was applied (even though the
8015+
// ChannelManager was).
8016+
// TODO
8017+
}
79898018
},
79908019
HTLCSource::PreviousHopData(HTLCPreviousHopData {
79918020
ref short_channel_id,
@@ -8123,6 +8152,7 @@ where
81238152
&payment_hash,
81248153
&reason,
81258154
receiver,
8155+
None,
81268156
);
81278157
}
81288158
return;
@@ -8269,7 +8299,7 @@ where
82698299
err_data,
82708300
);
82718301
let receiver = HTLCHandlingFailureType::Receive { payment_hash };
8272-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
8302+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver, None);
82738303
}
82748304
self.claimable_payments.lock().unwrap().pending_claiming_payments.remove(&payment_hash);
82758305
}
@@ -8619,22 +8649,35 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
86198649
debug_assert!(self.background_events_processed_since_startup.load(Ordering::Acquire),
86208650
"We don't support claim_htlc claims during startup - monitors may not be available yet");
86218651
debug_assert_eq!(next_channel_counterparty_node_id, path.hops[0].pubkey);
8622-
let ev_completion_action = EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
8623-
channel_funding_outpoint: Some(next_channel_outpoint),
8624-
channel_id: next_channel_id,
8625-
counterparty_node_id: path.hops[0].pubkey,
8626-
};
8652+
let mut ev_completion_action =
8653+
Some(EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
8654+
channel_funding_outpoint: Some(next_channel_outpoint),
8655+
channel_id: next_channel_id,
8656+
counterparty_node_id: path.hops[0].pubkey,
8657+
});
86278658
self.pending_outbound_payments.claim_htlc(
86288659
payment_id,
86298660
payment_preimage,
86308661
bolt12_invoice,
86318662
session_priv,
86328663
path,
86338664
from_onchain,
8634-
ev_completion_action,
8665+
&mut ev_completion_action,
86358666
&self.pending_events,
86368667
&self.logger,
86378668
);
8669+
// If an event was generated, `claim_htlc` set `ev_completion_action` to None, if
8670+
// not, we should go ahead and run it now (as the claim was duplicative), at least
8671+
// if a PaymentClaimed event with the same action isn't already pending.
8672+
let have_action = if ev_completion_action.is_some() {
8673+
let pending_events = self.pending_events.lock().unwrap();
8674+
pending_events.iter().any(|(_, act)| *act == ev_completion_action)
8675+
} else {
8676+
false
8677+
};
8678+
if !have_action {
8679+
self.handle_post_event_actions(ev_completion_action);
8680+
}
86388681
},
86398682
HTLCSource::PreviousHopData(hop_data) => {
86408683
let prev_channel_id = hop_data.channel_id;
@@ -10262,7 +10305,8 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1026210305
channel_id: msg.channel_id,
1026310306
};
1026410307
let reason = HTLCFailReason::from_failure_code(LocalHTLCFailureReason::ChannelClosed);
10265-
self.fail_htlc_backwards_internal(&htlc_source.0, &htlc_source.1, &reason, receiver);
10308+
let (source, hash) = htlc_source;
10309+
self.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, None);
1026610310
}
1026710311

1026810312
Ok(())
@@ -10739,6 +10783,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1073910783
&payment_hash,
1074010784
&failure_reason,
1074110785
destination,
10786+
None,
1074210787
);
1074310788
}
1074410789

@@ -11332,6 +11377,7 @@ This indicates a bug inside LDK. Please report this error at https://github.com/
1133211377
&htlc_update.payment_hash,
1133311378
&reason,
1133411379
receiver,
11380+
None,
1133511381
);
1133611382
}
1133711383
},
@@ -12810,8 +12856,8 @@ where
1281012856
}
1281112857
}
1281212858

12813-
fn handle_post_event_actions(&self, actions: Vec<EventCompletionAction>) {
12814-
for action in actions {
12859+
fn handle_post_event_actions<I: IntoIterator<Item = EventCompletionAction>>(&self, actions: I) {
12860+
for action in actions.into_iter() {
1281512861
match action {
1281612862
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1281712863
channel_funding_outpoint: _,
@@ -13660,7 +13706,7 @@ where
1366013706
}
1366113707

1366213708
for (source, payment_hash, reason, destination) in timed_out_htlcs.drain(..) {
13663-
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination);
13709+
self.fail_htlc_backwards_internal(&source, &payment_hash, &reason, destination, None);
1366413710
}
1366513711
}
1366613712

@@ -15832,7 +15878,7 @@ where
1583215878
}
1583315879
for (source, hash, cp_id, chan_id) in shutdown_result.dropped_outbound_htlcs {
1583415880
let reason = LocalHTLCFailureReason::ChannelClosed;
15835-
failed_htlcs.push((source, hash, cp_id, chan_id, reason));
15881+
failed_htlcs.push((source, hash, cp_id, chan_id, reason, None));
1583615882
}
1583715883
channel_closures.push_back((
1583815884
events::Event::ChannelClosed {
@@ -15876,6 +15922,7 @@ where
1587615922
channel.context.get_counterparty_node_id(),
1587715923
channel.context.channel_id(),
1587815924
LocalHTLCFailureReason::ChannelClosed,
15925+
None,
1587915926
));
1588015927
}
1588115928
}
@@ -16575,22 +16622,23 @@ where
1657516622
// generating a `PaymentPathSuccessful` event but regenerating
1657616623
// it and the `PaymentSent` on every restart until the
1657716624
// `ChannelMonitor` is removed.
16578-
let compl_action =
16625+
let mut compl_action = Some(
1657916626
EventCompletionAction::ReleaseRAAChannelMonitorUpdate {
1658016627
channel_funding_outpoint: Some(
1658116628
monitor.get_funding_txo(),
1658216629
),
1658316630
channel_id: monitor.channel_id(),
1658416631
counterparty_node_id: path.hops[0].pubkey,
16585-
};
16632+
},
16633+
);
1658616634
pending_outbounds.claim_htlc(
1658716635
payment_id,
1658816636
preimage,
1658916637
bolt12_invoice,
1659016638
session_priv,
1659116639
path,
1659216640
false,
16593-
compl_action,
16641+
&mut compl_action,
1659416642
&pending_events,
1659516643
&&logger,
1659616644
);
@@ -16605,12 +16653,20 @@ where
1660516653
"Failing HTLC with payment hash {} as it was resolved on-chain.",
1660616654
payment_hash
1660716655
);
16656+
let completion_action = Some(PaymentCompleteUpdate {
16657+
counterparty_node_id: monitor.get_counterparty_node_id(),
16658+
channel_funding_outpoint: monitor.get_funding_txo(),
16659+
channel_id: monitor.channel_id(),
16660+
htlc_id: SentHTLCId::from_source(&htlc_source),
16661+
});
16662+
1660816663
failed_htlcs.push((
1660916664
htlc_source,
1661016665
payment_hash,
1661116666
monitor.get_counterparty_node_id(),
1661216667
monitor.channel_id(),
1661316668
LocalHTLCFailureReason::OnChainTimeout,
16669+
completion_action,
1661416670
));
1661516671
}
1661616672
}
@@ -17294,11 +17350,13 @@ where
1729417350
}
1729517351

1729617352
for htlc_source in failed_htlcs {
17297-
let (source, payment_hash, counterparty_id, channel_id, failure_reason) = htlc_source;
17353+
let (source, hash, counterparty_id, channel_id, failure_reason, ev_action) =
17354+
htlc_source;
1729817355
let receiver =
1729917356
HTLCHandlingFailureType::Forward { node_id: Some(counterparty_id), channel_id };
1730017357
let reason = HTLCFailReason::from_failure_code(failure_reason);
17301-
channel_manager.fail_htlc_backwards_internal(&source, &payment_hash, &reason, receiver);
17358+
channel_manager
17359+
.fail_htlc_backwards_internal(&source, &hash, &reason, receiver, ev_action);
1730217360
}
1730317361

1730417362
for (

lightning/src/ln/outbound_payment.rs

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ use lightning_invoice::Bolt11Invoice;
1717
use crate::blinded_path::{IntroductionNode, NodeIdLookUp};
1818
use crate::events::{self, PaidBolt12Invoice, PaymentFailureReason};
1919
use crate::ln::channel_state::ChannelDetails;
20-
use crate::ln::channelmanager::{EventCompletionAction, HTLCSource, PaymentId};
20+
use crate::ln::channelmanager::{
21+
EventCompletionAction, HTLCSource, PaymentCompleteUpdate, PaymentId,
22+
};
2123
use crate::ln::onion_utils;
2224
use crate::ln::onion_utils::{DecodedOnionFailure, HTLCFailReason};
2325
use crate::offers::invoice::{Bolt12Invoice, DerivedSigningPubkey, InvoiceBuilder};
@@ -2153,7 +2155,7 @@ impl OutboundPayments {
21532155
#[rustfmt::skip]
21542156
pub(super) fn claim_htlc<L: Deref>(
21552157
&self, payment_id: PaymentId, payment_preimage: PaymentPreimage, bolt12_invoice: Option<PaidBolt12Invoice>,
2156-
session_priv: SecretKey, path: Path, from_onchain: bool, ev_completion_action: EventCompletionAction,
2158+
session_priv: SecretKey, path: Path, from_onchain: bool, ev_completion_action: &mut Option<EventCompletionAction>,
21572159
pending_events: &Mutex<VecDeque<(events::Event, Option<EventCompletionAction>)>>,
21582160
logger: &L,
21592161
) where L::Target: Logger {
@@ -2174,7 +2176,7 @@ impl OutboundPayments {
21742176
amount_msat,
21752177
fee_paid_msat,
21762178
bolt12_invoice: bolt12_invoice,
2177-
}, Some(ev_completion_action.clone())));
2179+
}, ev_completion_action.take()));
21782180
payment.get_mut().mark_fulfilled();
21792181
}
21802182

@@ -2192,7 +2194,7 @@ impl OutboundPayments {
21922194
payment_hash,
21932195
path,
21942196
hold_times: Vec::new(),
2195-
}, Some(ev_completion_action)));
2197+
}, ev_completion_action.take()));
21962198
}
21972199
}
21982200
} else {
@@ -2321,7 +2323,7 @@ impl OutboundPayments {
23212323
path: &Path, session_priv: &SecretKey, payment_id: &PaymentId,
23222324
probing_cookie_secret: [u8; 32], secp_ctx: &Secp256k1<secp256k1::All>,
23232325
pending_events: &Mutex<VecDeque<(events::Event, Option<EventCompletionAction>)>>,
2324-
logger: &L,
2326+
logger: &L, completion_action: &mut Option<PaymentCompleteUpdate>,
23252327
) where
23262328
L::Target: Logger,
23272329
{
@@ -2472,6 +2474,7 @@ impl OutboundPayments {
24722474
}
24732475
};
24742476
let mut pending_events = pending_events.lock().unwrap();
2477+
// TODO: Handle completion_action
24752478
pending_events.push_back((path_failure, None));
24762479
if let Some(ev) = full_failure_ev {
24772480
pending_events.push_back((ev, None));

0 commit comments

Comments
 (0)