-
Couldn't load subscription status.
- Fork 420
Payment Retries #1059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Payment Retries #1059
Conversation
ec4d0c2 to
910814f
Compare
Codecov Report
@@ Coverage Diff @@
## main #1059 +/- ##
==========================================
+ Coverage 90.47% 90.66% +0.18%
==========================================
Files 68 67 -1
Lines 35273 35261 -12
==========================================
+ Hits 31914 31969 +55
+ Misses 3359 3292 -67
Continue to review full report at Codecov.
|
lightning-invoice/src/payment.rs
Outdated
| pub fn new<L: Deref, E: EventHandler>( | ||
| payer: P, router: R, network_graph: G, logger: L, event_handler: E | ||
| ) -> (Arc<Self>, PaymentRetryHandler<P, R, G, L, E>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that it would be better to separate the construction of InvoicePayer and PaymentRetryHandler, as it wouldn't require using an Arc while still allowing it.
910814f to
ae0cadd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, simple, clean API. I like it.
lightning-invoice/src/payment.rs
Outdated
| } | ||
|
|
||
| /// An [`EventHandler`] decorator for retrying failed payments. | ||
| pub struct PaymentRetryHandler<I, P: Deref, R, G, L: Deref, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I fully understand why this is separate from the Payer - can't we/shouldn't we just implement EventHandler on the Payer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from InvoicePayer, right? Payer is the trait parameter.
As of now, the ownership model is such that any decorating EventHandler (or BackgroundProcessor for that matter) would take ownership of this, which means InvoicePayer could then only be used as an event handler and no longer for initiating payments. For instance, we'll need to decorate this with NetworkUpdateHandler so that updates are applied before retries.
I suppose we could implement EventHandler for T: Deref<Target = InvoicePayer<...>> and users could use an Arc<InvoicePayer<...>> (haven't actually tried this). But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate from InvoicePayer, right? Payer is the trait parameter.
Err, yes.
I suppose we could implement EventHandler for T: Deref<Target = InvoicePayer<...>> and users could use an Arc<InvoicePayer<...>> (haven't actually tried this)
Yes, along this line is what I was suggesting.
But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
Hmm, I guess I don't see it, like, I have a thing that handles payment, I don't care about how it handles payment, or the fact that there's retries going on, I just want it to handle payment. Sure, I may need to wire it up, but having multiple things to wire up to get a payment to fire just seems like extra wires when its not adding flexibility for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems cleaner to separate paying from the retry behavior, which also lends to intuitive naming.
Hmm, I guess I don't see it, like, I have a thing that handles payment, I don't care about how it handles payment, or the fact that there's retries going on, I just want it to handle payment. Sure, I may need to wire it up, but having multiple things to wire up to get a payment to fire just seems like extra wires when its not adding flexibility for the user.
I see your point, though my remaining qualm is that you must pass it an EventHandler that way, whereas you have the option of not doing so otherwise. Further, you do need to know that it is handling retries since that dictates how you compose different event handlers (e.g., decorating it with NetworkUpdateHandler and not the other way around). So maybe it's better to be explicit about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, though my remaining qualm is that you must pass it an EventHandler that way, whereas you have the option of not doing so otherwise.
I mean my usual reply applies - do users ever have a reasonable use case where they'd want to ignore most events?
Further, you do need to know that it is handling retries since that dictates how you compose different event handlers (e.g., decorating it with NetworkUpdateHandler and not the other way around). So maybe it's better to be explicit about it?
I'm not entirely convinced that it's clear in the current api that you have to hook these things up in a given order, whether it says retry or not. I wonder if we should make it more clear? Maybe have this object pass the update to the network graph itself?
| } | ||
| } | ||
| /// Logs a byte slice in hex format. | ||
| #[macro_export] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually #[doc(hidden)] this and all the other macros in this file that really shoulnd't be used by downstream crates but only LDK crates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see log_internal needing that. But which other ones? Presumably users may want to log within their trait implementations, possibly using log_bytes, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess I was mostly figuring the logging stuff is a one-way street - the lightning crates log stuff, and users consume it and put it somewhere. I suppose there's no real reason to require that, though, so, ok, fair enough. Please do add it to the internal one, though.
lightning-invoice/src/payment.rs
Outdated
| fn pay_cached_invoice(&self, payment_hash: &PaymentHash) -> Result<(), PaymentError> { | ||
| let invoice_cache = self.invoice_cache.lock().unwrap(); | ||
| match invoice_cache.get(payment_hash) { | ||
| Some(invoice) => self.pay_invoice_internal(invoice), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we somehow check that the route used on a retry is different from the original? I could see users using this API and just hooking it up to a remote route-getter which doesn't care about our failed payments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that for simplicity this could be route-agnostic. Any changes to the network or scoring could affect the route calculation, so we should let those modules (e.g., higher-level event handlers) take care of making adjustments for the next time a route is fetched, if necessary, even if it means the same route is tried.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess it doesn't matter a ton, just feels like this may be a common thing for users to do, and we'd retry the same route four times and then fail 10 seconds later when we could have told the user that we failed right away. Maybe we should expose the retry count instead to make it explicit for users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess it doesn't matter a ton, just feels like this may be a common thing for users to do, and we'd retry the same route four times and then fail 10 seconds later when we could have told the user that we failed right away.
Well, three times... first attempt counts as one. :)
Curious about the remote route-getter use case. If it typically will return the same route, it seems retries would not be valuable, which is an argument for keeping InvoicePayer and PaymentRetryHandle separate.
Maybe we should expose the retry count instead to make it explicit for users?
On the other hand, that would be a good argument for merging InvoicePayer and PaymentRetryHandle given the number of attempts is stored on the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the remote route-getter use case. If it typically will return the same route, it seems retries would not be valuable, which is an argument for keeping InvoicePayer and PaymentRetryHandle separate.
Hmm, true. I guess you wouldn't generally use the invoice utilities in this case? I wonder if we merge the two and make the NetworkGraph an optional part if we can then just do retries if we have one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... may work, though now we start to affect the Router interface as it still needs a NetworkGraph for the first attempt. We could make it an Option in find_route. Alternatively, we could remove it entirely, so DefaultRouter would just hold its own NetworkGraph reference.
If we go with the latter option, then we could parameterize InvoicePayer with the number of attempts instead of using a constant, which would make the retry behavior explicit during its construction. This would of course require merging the two structs. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could remove it entirely, so DefaultRouter would just hold its own NetworkGraph reference.
If we go with the latter option, then we could parameterize InvoicePayer with the number of attempts instead of using a constant, which would make the retry behavior explicit during its construction. This would of course require merging the two structs. How does that sound?
I think I'm a fan of this, yea, it reduces extra wires to hook up, better captures the set of likely user use-cases and Does The Right Thing, or at least makes the right thing more obvious, in common cases.
3061394 to
97bd88c
Compare
52da654 to
aa6b426
Compare
|
Needs rebase now 🎉. |
4c64c33 to
9c09290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, but I like this API. Note that I don't think this can go in a release without #1053, so should either land after that, or at least in the same release.
lightning-invoice/src/payment.rs
Outdated
|
|
||
| // Either the payment was rejected, exceeded the maximum attempts, or failed retry. | ||
| self.remove_cached_invoice(payment_hash); | ||
| attempts_by_payment_hash.remove(payment_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems somewhat duplicative to get the entry above and then remove it here. You could just store the entry and remove the entry here, doing the and_modify.or_insert in the retry condition before the return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the number of attempts using and_modify and or_insert for the comparison, which consumes the entry. But if no retry is needed we, couldn't then remove the entry directly since it has been consumed.
Probably could just do and_modify then match to determine whether or not there's a value, but that doesn't seem as clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not quite as clean, but it does feel incredibly wasteful to hash into the hashmap twice for the same element 5 lines apart :/
--- a/lightning-invoice/src/payment.rs
+++ b/lightning-invoice/src/payment.rs
@@ -202,11 +202,13 @@ where
let mut attempts_by_payment_hash = self.payment_attempts.lock().unwrap();
- let attempts = attempts_by_payment_hash
- .entry(*payment_hash)
- .and_modify(|attempts| *attempts += 1)
- .or_insert(1);
+ let attempts_entry = attempts_by_payment_hash
+ .entry(*payment_hash);
+ let attempts = if let hash_map::Entry::Occupied(e) = &attempts_entry { e.get() + 1 } else { 1 };
if !rejected_by_dest {
let max_payment_attempts = self.retry_attempts + 1;
- if *attempts < max_payment_attempts {
+ if attempts < max_payment_attempts {
if self.pay_cached_invoice(payment_hash).is_ok() {
log_trace!(self.logger, "Payment {} failed; retrying (attempts: {})", log_bytes!(payment_hash.0), attempts);
+ attempts_entry
+ .and_modify(|attempts| *attempts += 1)
+ .or_insert(1);
return;
@@ -224,3 +226,2 @@ where
self.remove_cached_invoice(payment_hash);
- attempts_by_payment_hash.remove(payment_hash);
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you still need to remove the entry later, but that does work. However, there's still an additional look-up when remove_cached_invoice is called. Given no tests failed when the attempts entry was not removed -- and that when called independently remove_cached_invoice won't remove the attempts entry -- it probably makes sense to combine these two hash maps to ensure consistency. Along with your improvement, that takes us to one look-up instead of three.
a308452 to
41d2b07
Compare
c6e3c06 to
4caedcf
Compare
|
Once #1084 is merged, I can rebase and fix the failing checks. |
|
Can be rebased now :) |
4caedcf to
35ad581
Compare
|
Note we should consider the invoice expiry when/before retrying. |
c7d2086 to
14f7d04
Compare
lightning-invoice/src/payment.rs
Outdated
| log_trace!(self.logger, "Payment {} failed; retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); | ||
| return; | ||
| } else { | ||
| log_trace!(self.logger, "Error retrying payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the log could be a bit confusing here, because instead of a payment failed error, you immediately get an error retrying payment error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let me know if you have a suggestion for rewording.
2aca5b1 to
e7c1481
Compare
Ok, this is now using the parameters from Otherwise, the main changes are:
|
lightning/src/util/ser.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl Writeable for Duration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we just write out a u64 instead? It feels a bit funky serializing a duration and losing a bunch of precision as a general thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as per other discussion. Could also keep Duration and serialized as u128 nanoseconds. That said, for our use case, BOLT 11 defines the expiry time in seconds, so no need for more precision.
lightning/src/routing/router.rs
Outdated
| /// Hints for routing to the payee, containing channels connecting the payee to public nodes. | ||
| pub route_hints: Vec<RouteHint>, | ||
|
|
||
| /// Expiration of a payment to the payee, relative to a user-defined epoch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "user-defined epoch"? How is this expected to work with InvoicePayer - doesn't it always check against 1970? Can we just make this seconds-since-1970 and a u64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thought was to let the user set this relative to whatever they wanted because ultimately they would be setting it and interpreting it. So even if we say it's seconds since the UNIX epoch, there's really nothing enforcing that.
Anyhow, updated to u64 as discussed offline.
lightning-invoice/src/payment.rs
Outdated
| // error occurred when attempting to retry. | ||
| entry.remove(); | ||
| } else { | ||
| log_trace!(self.logger, "Unknown payment {}; cannot retry)", log_bytes!(payment_hash.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to auto-insert here (and maybe a test for retry on reload, but I'm happy to add that as a followup for you).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to handle this and added a unit test.
e7c1481 to
c241093
Compare
| /// The id returned by [`ChannelManager::send_payment`] and used with | ||
| /// [`ChannelManager::retry_payment`]. | ||
| /// | ||
| /// [`ChannelManager::send_payment`]: crate::ln::channelmanager::ChannelManager::send_payment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just noticed that documentation for payment_id seems to be missing from ChannelMannager::send_payment. It would be nice to (a) document that the id should be stored for a retry, and (b) that it's needed because technically PaymentHashs can be non-unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving unresolved pending discussions around changes to send_payment.
| write_tlv_fields!(writer, { | ||
| (0, payment_preimage, required), | ||
| (1, payment_hash, required), | ||
| (3, payment_id, option), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be a little nicer public API-wise if payment_id were a non-Option, and set to 0 if serialized before 0.0.104 (or wtv). Similar for the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we leave it an Option for now and then just do a hard-break on our serialization backwards compat in like 2-3 versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe that is why we decided to key by payment_hash for now? I guess we could just not retry if PaymentId is None, though.
| } | ||
|
|
||
| /// Pays the given [`Invoice`], caching it for later use in case a retry is needed. | ||
| pub fn pay_invoice(&self, invoice: &Invoice) -> Result<PaymentId, PaymentError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I'd feel better about this if we had a plan for this, it does seem like a lot of rewrites would have to happen to add spontaneous support (and feels odd to merge code that'll be definitely rewritten)? From Slack chats, it looks like the relevant user has implemented their own retries for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to properly interpret PaymentSendFailure errors but otherwise looking good, I think.
| let payment_hash = PaymentHash(invoice.payment_hash().clone().into_inner()); | ||
| let payment_secret = Some(invoice.payment_secret().clone()); | ||
| let payment_id = self.payer.send_payment(&route, payment_hash, &payment_secret) | ||
| .map_err(|e| PaymentError::Sending(e))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to figure out what's up - if we get a PaymentSendFailure::ParameterError or PaymentSendFailure::PathParameterError then we can return an error here, I think. AllFailedRetrySafe we should retry the requisite numbers right here, PartialFailure we should insert and retry the relevant paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. This may require a bit more work since we lose the PaymentId if send_payment returns an error. Ideally, we'd get back the proper RouteParameters for retry.
| } else if has_expired(retry.as_ref().unwrap()) { | ||
| log_trace!(self.logger, "Invoice expired for payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); | ||
| } else if self.retry_payment(*payment_id.as_ref().unwrap(), retry.as_ref().unwrap()).is_err() { | ||
| log_trace!(self.logger, "Error retrying payment {}; not retrying (attempts: {})", log_bytes!(payment_hash.0), attempts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not always imply failure to send at all - see PaymentSendFailure and the notes at PaymentSendFailure::PartialFailure about when retry is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these cases (both returned by send_payment and retry_payment), I take it we must retry here as there won't be a PaymentPathFailed event. Though one idea would be to generate them. It would make the retry attempt logic cleaner at very least.
c241093 to
2864f61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An upcoming Router interface will be used for finding a Route both when initially sending a payment and also when retrying failed payment paths. Unify the three varieties of get_route so the interface can consist of a single method implemented by the new `find_route` method. Give get_route pub(crate) visibility so it can still be used in tests.
2864f61 to
2042241
Compare
When a payment fails, it's useful to retry the payment once the network graph and channel scores are updated. InvoicePayer is a utility for making payments which will retry any failed payment paths for a payment up to a configured number of total attempts. It is parameterized by a Payer and Router for ease of customization and testing. Implement EventHandler for InvoicePayer as a decorator that intercepts PaymentPathFailed events and retries that payment using the parameters from the event. It delegates to the decorated EventHandler after retries have been exhausted and for other events.
According to BOLT 11: - after the `timestamp` plus `expiry` has passed - SHOULD NOT attempt a payment Add a convenience method for checking if an Invoice has expired, and use it to short-circuit payment retries.
Implements Payer for ChannelManager and Rotuer for find_route, which can be used to parameterize InvoicePayer when needing payment retries.
2042241 to
010436d
Compare
|
Diff from previous ACKs is trivial test changes, taking: |
When a payment fails, it's useful to retry the payment once the network graph and channel scores are updated.
InvoicePayeris a utility for making payments which caches the paymentInvoiceso that it can be used when retrying the payment. It is parameterized by aPayerandRouterfor ease of customization and testing.PaymentRetryHandleris used in connection withInvoicePayer. It is anEventHandlerdecorator that interceptsPaymentFailedevents and retries that payment using the cachedInvoice.