Penalize failed channels #1144

jkczyz · 2021-10-27T16:11:25Z

Adds a new payment_path_failed method to routing::Score for penalizing failed channels, which is called by InvoicePayer before retrying failed payments in the course of handling PaymentPathFailed events.

Implements the new method in Scorer by applying a configurable penalty on top of the base penalty. The new penalty decays over time but may be further increased if the channel continues to fail.

TODO: Test new Scorer behavior.

codecov · 2021-10-27T16:32:28Z

Codecov Report

Merging #1144 (db05a14) into main (59659d3) will decrease coverage by 0.09%.
The diff coverage is 82.88%.

@@            Coverage Diff             @@
##             main    #1144      +/-   ##
==========================================
- Coverage   90.51%   90.42%   -0.10%     
==========================================
  Files          69       70       +1     
  Lines       35865    39419    +3554     
==========================================
+ Hits        32462    35643    +3181     
- Misses       3403     3776     +373

Impacted Files	Coverage Δ
lightning-invoice/src/utils.rs	`74.52% <50.00%> (-8.99%)`	⬇️
lightning/src/routing/mod.rs	`50.00% <50.00%> (ø)`
lightning/src/routing/scorer.rs	`52.38% <51.35%> (-14.29%)`	⬇️
lightning-invoice/src/payment.rs	`92.78% <93.58%> (-0.05%)`	⬇️
lightning/src/routing/router.rs	`92.91% <93.75%> (-2.64%)`	⬇️
lightning-background-processor/src/lib.rs	`95.94% <100.00%> (+1.71%)`	⬆️
lightning/src/ln/channelmanager.rs	`87.09% <100.00%> (+3.06%)`	⬆️
lightning/src/ln/functional_test_utils.rs	`97.01% <100.00%> (+1.90%)`	⬆️
lightning/src/ln/functional_tests.rs	`97.29% <100.00%> (-0.09%)`	⬇️
lightning/src/ln/shutdown_tests.rs	`95.89% <100.00%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59659d3...db05a14. Read the comment docs.

jkczyz · 2021-10-27T17:19:34Z

lightning/src/routing/scorer.rs

+#[cfg(not(feature = "no-std"))]
+fn decay_from(penalty_msat: u64, last_failure: &SystemTime, decay_interval: Duration) -> u64 {
+	let decays = last_failure.elapsed().ok().map_or(0, |elapsed| {
+		elapsed.as_secs() / decay_interval.as_secs()


Note to self: handle divide by zero 😛

lightning-invoice/src/payment.rs

TheBlueMatt · 2021-10-27T18:07:39Z

lightning-invoice/src/payment.rs

 where
 	P::Target: Payer,
 	R: Router,
+	S: routing::Score,


As an alternative to an owned Score, should we consider a "two-layer" trait with a Score and LockedScore where router takes the second?

I kinda prefer this way as it just uses plain references. Could be convinced otherwise, though.

I guess I'm looking at this from two directions:

in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a dyn Trait and from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or make Score require Writeable, which is somewhat awkward, though not insane.

in the general case, its a bit awkward for users to have to get their InvoicePayer to serialize their Score data - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.

Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:

Our implementation does works against it without the user having to write any additional characters,

we can implement the parent trait for Thing<Deref<Target=Mutex<ThingB: subtrait>>> (or implement it for Thing: Deref<Target=subtrait> directly with no-std), making it somewhat transparent at least,

bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.

I guess I'm looking at this from two directions:

in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a dyn Trait and from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or make Score require Writeable, which is somewhat awkward, though not insane.

Just a thought, we could have S: routing::Score + Clone and have the accessor return a copy. Seems like it would save a lot of trouble using multiple traits for a small cost. Would this solve the bindings issue at least an interim solution?

in the general case, its a bit awkward for users to have to get their InvoicePayer to serialize their Score data - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.

FWIW, they will already need to use Arc<InvoicePayer> as it needs to be passed to the BackgroundProcessor for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc<InvoicePayer> wouldn't be horrible. And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?

Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:

Our implementation does works against it without the user having to write any additional characters,

we can implement the parent trait for Thing<Deref<Target=Mutex<ThingB: subtrait>>> (or implement it for Thing: Deref<Target=subtrait> directly with no-std), making it somewhat transparent at least,

bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.

I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route a LockedScore. But the relationship between LockedScore and Score (which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?

Wonder if using an associate type in some manner would make this simpler?

Just a thought, we could have S: routing::Score + Clone and have the accessor return a copy.

Yea, that's something I'd been thinking about for a different reason (cause java does that a ton anyway), and I guess its okay? I'm not really a huge fan of clone'ing a hashmap that may get an entry for every channel in the graph (which may be the case for users doing probing), though, that could be a lot of data.

FWIW, they will already need to use Arc as it needs to be passed to the BackgroundProcessor for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc wouldn't be horrible.

Ah, that's a good point re: user code complexity.

And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?

So we'd make Score : Writeable and add a utility method to InvoicePayer to just write the scores out directly? I'm not 100% sure where you're going with this.

I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route a LockedScore. But the relationship between LockedScore and Score (which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?

Sorry, I was using "supertrait" liberally (and incorrectly). What I was thinking of was (but with better naming):

trait Score { type Scorer: LockedScore; fn get_scorer(&self) -> Scorer; } trait LockedScore { fn get_score(&self, scid: u64, ..) -> u64; } #[cfg(not(no_std))] impl<LS: LockedScore, T: Deref<Target=Mutex<LockedScore>>> Score for T { ... } #[cfg(no_std)] impl<LS: LockedScore, T: Deref<Target=LockedScore>> Score for T { ... }

Hmmmmm, that could work. I guess it does feel a lot less explicit (in the sense that users lose the flexibility of selecting how locking works) and fairly different from our existing APIs which are built around Derefs.

If we go this route, to make the bindings sane, we'd probably want to create a WriteableScore trait that is just pub trait WriteableScore : Score + Writeable {} and use that, then create a constructor for ScorePersister that mirrors the InvoicePayer constructor and just takes a WriteableScore instead of a Score.

Tried your solution with some renaming based on our offline discussion. Ran into some lifetime hell while doing so... 😕 Pushed a commit that almost works. Any chance you could see where I've gone wrong?

FWIW, the MutexGuard is making this tricky

Grrrrrrrrr, right, so the direct thing you want to do here requires GATs, which may land in, like, the next version of rust or something. However, in trying to fix this I learned of a new rust syntax which seems to be exactly the subset of GATs that we want here - HRTB. I pushed a cargo check'ing branch at https://git.bitcoin.ninja/index.cgi?p=rust-lightning;a=log;h=refs/heads/1144-magic

Thanks! Got it working now. Only annoying thing about the blanket impl is that I get a conflicting implementation error if I try to implement it with T: Deref<Target=RefCell<S>>, which I was hoping to do for tests instead of using a Mutex. Which I guess means any Deref must reference a Mutex? Might be some way to work around it by introducing another type parameter? Don't need to worry about this now, though.

lightning/src/routing/scorer.rs

TheBlueMatt · 2021-10-27T18:11:10Z

lightning/src/routing/scorer.rs

-	base_penalty_msat: u64,
+	params: ScoringParameters,
+	#[cfg(not(feature = "no-std"))]
+	channel_failures: HashMap<u64, (u64, SystemTime)>,


This should be an Instant instead, no? We want a monotonic clock.

Instant is opaque, so I'd imagine it would be difficult to persist, whereas with SystemTime we can get a Duration since the unix epoch. IIUC, any decrease in time as used with elapsed would be before the first decay, so it shouldn't have any effect.

Maybe we can persist the SystemTime::now() - Instant.elapsed()?

hmmm... but we can't deserialize it back into Instant since the only way to create one is Instant::now or from another Instant.

Oh, right, yuck. Ummmmmmmm Instant::now() - (SystemTime::now() - deserialized_systemtime)? Its gross, but more technically correct...

Ok, I think I convinced myself this is possible by serializing in terms of a Duration since the UNIX epoch, and then upon deserialization do a similar adjustment.

let time = std::time::Instant::now(); let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap(); let serialized_time = duration_since_epoch - time.elapsed(); println!("time: {:?}", time); println!("duration_since_epoch: {:?}", duration_since_epoch); println!("serialized_time: {:?}", serialized_time); std::thread::sleep(std::time::Duration::from_secs(1)); let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap(); let duration_since_instant = duration_since_epoch - serialized_time; let deserialized_time = std::time::Instant::now() - duration_since_instant; println!("duration_since_epoch: {:?}", duration_since_epoch); println!("duration_since_instant: {:?}", duration_since_instant); println!("deserialized_time: {:?}", deserialized_time);

TheBlueMatt · 2021-10-28T18:10:49Z

lightning-invoice/src/payment.rs


 /// A utility for paying [`Invoice]`s.
-pub struct InvoicePayer<P: Deref, R, L: Deref, E>
+pub struct InvoicePayer<P: Deref, R, S, L: Deref, E>


Should S be a Deref to a LockableScore?

We implement LockableScore for T: Deref<Target=Mutex<S>>, so I don't think we do unless there is another reason you're thinking of? See last commit for use with BackgroundProcessor.

Hmmmm, right, does it work if we implement LockableScore for Mutex<S> instead? It seems like a Deref here is more consistent with our API.

Good point. Done. Also, implemented for RefCell now that it is possible.

lightning/src/routing/scorer.rs

TheBlueMatt · 2021-10-28T21:24:26Z

lightning-invoice/src/payment.rs

 	/// Finds a [`Route`] between `payer` and `payee` for a payment with the given values.
-	fn find_route(
-		&self, payer: &PublicKey, params: &RouteParameters, first_hops: Option<&[&ChannelDetails]>
+	fn find_route<S: routing::Score>(


The S bound should probably be on the trait itself, no? ie if a user always constructs a InvoicePayer with CustomLocalUserRouter then find_route should take a CustomLocalUserRouter and not a S. If the complexity of the type annotations blows up as a result of this, feel free to ignore :).

Not sure I quite follow. Why would find_route take itself (in addition to self) instead of a Score?

I think you're trying to say that the type S: routing::Score parameter used in Router should be the same Score as required by LockableScore<'a>::Locked. The compiler happily leads me to use the following ugly syntax and to use PhantomData in DefaultRouter.

R: for <'a> Router<<<S as Deref>::Target as routing::LockableScore<'a>>::Locked>,

But it seems I'm just doing what the compiler is already inferring, no? Did I misunderstand? I suppose internally, we could accidentally call find_route with an entirely different Score than what the user parameterized InvoicePayer with. But I don't think a user could do so.

Instead of the current code, it'd be

pub trait Router<S: routing::Score> { fn find_route(&self, ...&S) }

Yeah, we're on the same page as discussed offline. Also, turns out I didn't need to use PhantomData. I was being overzealous in where I was adding type parameters.

lightning/src/routing/scorer.rs

TheBlueMatt

Other than the above two comments, LGTM.

TheBlueMatt

Note that some of the fixups on Notify scorer of failing payment path probably should be on Parameterize InvoicePayer by routing::Score. I'd be totally fine if those both just get squashed into one commit, though.

TheBlueMatt · 2021-10-29T17:43:18Z

lightning/src/routing/mod.rs

+	}
+}
+
+impl<S: Score, T: Deref<Target=S> + DerefMut<Target=S>> Score for T {


DerefMut extends Deref so you should be able to drop the Deref bound.

valentinewallace

Shaping up IMO! Mostly docs requests

valentinewallace · 2021-10-28T19:15:33Z

lightning/src/routing/scorer.rs

 //!
-//! // Or use a custom channel penalty.
-//! let scorer = Scorer::new(1_000);
+//! // Or use custom channel penalties.


since this says custom channel "penalties," could we change it to either say "penalty" or have multiple penalties be custom?

Added another custom penalty.

valentinewallace · 2021-10-28T19:32:14Z

lightning/src/routing/scorer.rs

+		}
+	}
+
+	/// Creates a new scorer using `penalty_msat` as a fixed channel penalty.


Could specify that this will only have a fixed base channel penalty

Whoops, missed this comment.

Actually, the other penalty is zero so the overall penalty will be fixed.

valentinewallace · 2021-10-29T16:41:59Z

lightning-invoice/src/payment.rs

 	}

+	#[test]
+	fn scores_failed_channel() {


Could we add a high-level comment? I'm a bit confused what this tests

Added a comment. There is an expectation set on TestScorer upon creation that it is called with a specific short_channel_id. It will fail if it is not called at all or called with a different short_channel_id.

valentinewallace · 2021-10-29T17:54:11Z

lightning/src/routing/scorer.rs

+}
+
+#[cfg(not(feature = "no-std"))]
+fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 {


High level doc for the return value?

This is refactored in #1146 into a method called decayed_penalty, so will leave as is to avoid a lengthy rebase process. Happy to address any concerns in that PR. Probably should include units in the name.

Gotcha, just noticed a lot of this is rewritten in #1146. Thanks!

valentinewallace · 2021-10-29T17:59:29Z

lightning/src/routing/scorer.rs

+
+#[cfg(not(feature = "no-std"))]
+fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 {
+	let decays = last_failure.elapsed().as_secs().checked_div(half_life.as_secs());


Do we test that it won't decay if less than half_life has passed?

Nothing is tested at the moment... 🙂 Will do in a follow-up. Confirmed this results in no shifts because checked_div returns Some(0).

Upon receiving a PaymentPathFailed event, the failing payment may be retried on a different path. To avoid using the channel responsible for the failure, a scorer should be notified of the failure before being used to find a new route. Add a payment_path_failed method to routing::Score and call it in InvoicePayer's event handler. Introduce a LockableScore parameterization to InvoicePayer so the scorer is locked only once before calling find_route.

As payments fail, the channel responsible for the failure may be penalized. Implement Scorer::payment_path_failed to penalize the failed channel using a configured penalty. As time passes, the penalty is reduced using exponential decay, though penalties will accumulate if the channel continues to fail. The decay interval is also configurable.

Proof of concept showing InvoicePayer can be used with an Arc<ChannelManager> passed to BackgroundProcessor. Likely do not need to merge this commit.

TheBlueMatt

Squashed without diff from Val's ACK, will land after CI:

$ git diff-tree -U1 5c8466b4 db05a14a
$

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch from 371b971 to 0edc550 Compare October 27, 2021 16:43

TheBlueMatt added this to the 0.0.103 milestone Oct 27, 2021

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch 2 times, most recently from f538e01 to a90a9e5 Compare October 27, 2021 17:07

jkczyz commented Oct 27, 2021

View reviewed changes

TheBlueMatt reviewed Oct 27, 2021

View reviewed changes

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch 2 times, most recently from dbedf14 to 1281d49 Compare October 28, 2021 17:31

TheBlueMatt reviewed Oct 28, 2021

View reviewed changes

lightning/src/routing/scorer.rs Outdated Show resolved Hide resolved

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch 3 times, most recently from 27e7947 to 21cbd03 Compare October 28, 2021 20:46

TheBlueMatt reviewed Oct 28, 2021

View reviewed changes

lightning/src/routing/scorer.rs Show resolved Hide resolved

TheBlueMatt reviewed Oct 28, 2021

View reviewed changes

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch from 21cbd03 to 269a85a Compare October 29, 2021 02:47

valentinewallace self-requested a review October 29, 2021 15:50

jkczyz mentioned this pull request Oct 29, 2021

Scorer serialization #1146

Merged

TheBlueMatt approved these changes Oct 29, 2021

View reviewed changes

valentinewallace reviewed Oct 29, 2021

View reviewed changes

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch from 269a85a to 5c8466b Compare October 29, 2021 19:04

valentinewallace approved these changes Oct 29, 2021

View reviewed changes

jkczyz added 3 commits October 29, 2021 14:24

Test InvoicePayer in BackgroundProcessor

db05a14

Proof of concept showing InvoicePayer can be used with an Arc<ChannelManager> passed to BackgroundProcessor. Likely do not need to merge this commit.

jkczyz force-pushed the 2021-10-invoice-payer-scoring branch from 5c8466b to db05a14 Compare October 29, 2021 19:28

TheBlueMatt approved these changes Oct 29, 2021

View reviewed changes

TheBlueMatt merged commit c53048a into lightningdevkit:main Oct 29, 2021

TheBlueMatt mentioned this pull request Oct 29, 2021

Rewrite InvoicePayer retry to correctly handle MPP partial failures #1141

Merged

jkczyz mentioned this pull request Oct 30, 2021

Add default implementation for tracking metadata #636

Closed

Uh oh!

Penalize failed channels #1144

Penalize failed channels #1144

Uh oh!

Conversation

jkczyz commented Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt Oct 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

jkczyz commented Oct 27, 2021 •

edited

Loading

codecov bot commented Oct 27, 2021 •

edited

Loading

TheBlueMatt Oct 27, 2021 •

edited

Loading

TheBlueMatt Oct 28, 2021 •

edited

Loading