Enable client-side timeouts and replace retry logic with `reqwest`'s #39

tnull · 2025-08-19T09:45:30Z

Based on #38.

We enable reqwest client-level timeouts:

While the `RetryPolicy` has a `MaxTotalDelayRetryPolicy`, the retry
`loop` would only check this configured delay once the operation future
actually returns a value. However, without client-side timeouts, we're
not super sure the operation is actually guaranteed to return anything
(even an error, IIUC).

So here, we enable some coarse client-side default timeouts to ensure
the polled futures eventualy return either the response *or* an error we
can handle via our retry logic.

~~Additionally, we here rip out our ~broken retry logic and replace it by utilizing reqwest's retry logic that shipped in the recent v0.12.23 release.~~

ldk-reviews-bot · 2025-08-19T09:45:33Z

👋 Thanks for assigning @tankyleo as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

tankyleo · 2025-08-19T20:44:05Z

src/client.rs

+			.timeout(DEFAULT_TIMEOUT)
+			.connect_timeout(DEFAULT_TIMEOUT)
+			.read_timeout(DEFAULT_TIMEOUT)


Thank you seems like we could just do with the single global timeout here, no need for connect and read ?

But we can leave it as is and potentially tweak the inner timeouts later.

Right, no strong opinion here.

tankyleo · 2025-08-19T20:50:52Z

One aspect that might be debatable here is whether we should drop MaxTotalDelayRetryPolicy given it would interact with the client-side default delay. Hence also pinging @jkczyz who reviewed the original retry PR.

For reference original PR is #20. On first impression, I'd be in favor of the drop myself.

jkczyz · 2025-08-20T00:13:31Z

One aspect that might be debatable here is whether we should drop MaxTotalDelayRetryPolicy given it would interact with the client-side default delay. Hence also pinging @jkczyz who reviewed the original retry PR.

For reference original PR is #20. On first impression, I'd be in favor of the drop myself.

Do the added timeouts apply to a single operation? If it is never exceeded, wouldn't we still want MaxTotalDelayRetryPolicy to allow limiting retries to a maximum amount of time?

What is meant by client-side default delay?

tnull · 2025-08-20T07:00:31Z

Do the added timeouts apply to a single operation?

Yes, they apply for a single read, but also for connecting / detecting dropped connections AFAIU.

If it is never exceeded, wouldn't we still want MaxTotalDelayRetryPolicy to allow limiting retries to a maximum amount of time?

Yes, it could be useful, but of course its somewhat redundant if we set a client-side timeout and limit the number of retries. It could therefore be a bit confusing if somebody configures the MaxTotalDelayRetryPolicy, but still the other total delay applies if its lesser (i.e., number of retries times timeout).

What is meant by client-side default delay?

Ah, sorry, that was a typo I only corrected in the PR title: should have said default timeout, not delay.

jkczyz · 2025-08-20T14:35:56Z

Yes, it could be useful, but of course its somewhat redundant if we set a client-side timeout and limit the number of retries.

Do you mean MaxTotalDelayRetryPolicy<MaxAttemptsRetryPolicy<R>>?

It could therefore be a bit confusing if somebody configures the MaxTotalDelayRetryPolicy, but still the other total delay applies if its lesser (i.e., number of retries times timeout).

Isn't this already the case when configured as I mentioned above? Number of attempts takes priority over total delay given the way MaxTotalDelayRetryPolicy is written.

Maybe I'm confused about what is lesser in that example.

tnull · 2025-08-21T08:11:50Z

Do you mean MaxTotalDelayRetryPolicy<MaxAttemptsRetryPolicy<R>>?

Yes, if each client call is also limited by a timeout, then we'd have either timeout*MaxAttemptsRetryPolicy or MaxTotalDelayRetryPolicy being the limiting factor.

Maybe I'm confused about what is lesser in that example.

Say you configure MaxTotalDelayRetryPolicy<MaxAttemptsRetryPolicy<R>> with 5 retries and a total delay of 100 seconds (for the sake of this example). Then you'd expect the client to return either in case of success or after it tried 5 times or after 100s, whatever comes first. Now with the client-side timeouts we also have each retry timeout after 10 seconds, so it might already be done after 50s.

Or, maybe even a bit more confusing would be if the user configured a MaxTotalDelayRetryPolicy of less than 10s, say 5s. They would expect a client call def. return after that 5s. But, if we have a client-side timeout of 10s (or none before this PR), even the first attempt could take way longer than the configured total delay (since we don't use a select but rather a loop, we always await the first/current operation to return).

ldk-reviews-bot · 2025-08-21T09:46:32Z

🔔 1st Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

jkczyz · 2025-08-21T14:40:51Z

Say you configure MaxTotalDelayRetryPolicy<MaxAttemptsRetryPolicy<R>> with 5 retries and a total delay of 100 seconds (for the sake of this example). Then you'd expect the client to return either in case of success or after it tried 5 times or after 100s, whatever comes first. Now with the client-side timeouts we also have each retry timeout after 10 seconds, so it might already be done after 50s.

Isn't that expected? "done after 50s" is really "done after 5 attempts".

Or, maybe even a bit more confusing would be if the user configured a MaxTotalDelayRetryPolicy of less than 10s, say 5s. They would expect a client call def. return after that 5s. But, if we have a client-side timeout of 10s (or none before this PR), even the first attempt could take way longer than the configured total delay (since we don't use a select but rather a loop, we always await the first/current operation to return).

Yeah, though isn't that a good argument to use select? Or does the current design not allow that given policy timeout is built into the type rather than the calling site being aware of it?

tnull · 2025-08-21T15:59:45Z

Yeah, though isn't that a good argument to use select? Or does the current design not allow that given policy timeout is built into the type rather than the calling site being aware of it?

True, seems like we should? And given that we already use tokio with the time feature, it should be straightforward. I think I'll add a commit to this PR.

tnull · 2025-08-22T12:53:39Z

Yeah, though isn't that a good argument to use select? Or does the current design not allow that given policy timeout is built into the type rather than the calling site being aware of it?

True, seems like we should? And given that we already use tokio with the time feature, it should be straightforward. I think I'll add a commit to this PR.

Argh, after looking into it for a bit I have to eat my words: it's actually not trivial, as currently RetryPolicy::next_delay requires us to supply the returned error, i.e., we can only calculate the next delay based on the error type (as we use it in FilteredRetryPolicy).

And more generally, with a generic error type, we don't know what error we'd return in case the the timeout happens before the operation future resolves.

jkczyz · 2025-08-22T16:13:23Z

And more generally, with a generic error type, we don't know what error we'd return in case the the timeout happens before the operation future resolves.

Would it help defining an enum parameterized by the error E where one variant is for a timeout and the other for wrapping E?

ldk-reviews-bot · 2025-08-23T09:47:00Z

🔔 2nd Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-08-25T09:47:42Z

🔔 3rd Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-08-27T09:48:20Z

🔔 4th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-08-30T00:01:08Z

🔔 5th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-09-01T00:01:51Z

🔔 6th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-09-03T00:02:29Z

🔔 7th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-09-06T00:01:10Z

🔔 8th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-09-08T00:01:51Z

🔔 9th Reminder

Hey @jkczyz! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

jkczyz

Waiting on @tnull (no rush!). Just want to silence the review bot.

tnull · 2025-11-04T13:29:15Z

Excuse the delay here. After yet another tokio/vss-client/reqwest odyssey, I now discovered that switching to this branch (or more likely: reqwest >v0.12) got rid of some blocking behavior for ChannelMonitor persistence to VSS. When looking into it, I also realized that reqwest recently shipped retry logic, which (mostly) allows us to remove our homebrewn (and ~broken, as discussed above) retry logic. I therefore now updated this PR to drop crate::utils::retry entirely, and switch to the reqwest-provided logic, making this PR diff net negative (:tada:). The only piece missing is really just to classify whether or not to retry by error type, which we hope they will add at some point (seanmonstar/reqwest#2852), but we should probably be good to ship this as is in the meantime. It def. should be a net improvement.

tnull · 2025-11-05T13:18:43Z

Now also tacked-on a commit that corrected the expected HTTP status codes in our mocked test service. Relatedly, I discovered that the Rust vss-server didn't use the correct codes either (a regression from the Java version), fixed in lightningdevkit/vss-server#65

tnull · 2025-11-05T13:26:36Z

Now also tacked-on a commit that corrected the expected HTTP status codes in our mocked test service. Relatedly, I discovered that the Rust vss-server didn't use the correct codes either (a regression from the Java version), fixed in lightningdevkit/vss-server#65

.. and finally it turns out that we can drop our RetryPolicy in this PR as, once corrected, we can deal with all the cases through HTTP status codes, no need to parse the payload.

While the `RetryPolicy` has a `MaxTotalDelayRetryPolicy`, the retry `loop` would only check this configured delay once the operation future actually returns a value. However, without client-side timeouts, we're not super sure the operation is actually guaranteed to return anything (even an error, IIUC). So here, we enable some coarse client-side default timeouts to ensure the polled futures eventualy return either the response *or* an error we can handle via our retry logic.

.. as some types are part of our API.

We here make use of `reqwest`'s `retry` functionlity that was introduced with its recent v0.12.23 release. As we can't fully replace the old 'application-level' behavior just yet, we configure it to only apply for error that arent INTERNAL_SERVER_ERROR, which is the status VSS uses to send error responses. We set some default values here, but note that these can always can be overridden by the user when using the `from_client` constructor.

We drop our (~broken) `RetryPolicy`, and replace it with the new-ish retry policy that is shipping in `reqwest` since 0.12.23.

tnull · 2025-11-05T14:24:52Z

Rebased to resolve minor conflicts.

Previously, we'd allow to either re-use a `reqwest::Client` or supply a header provider. Here we add a new constructor that allows us to do both at the same time.

ldk-reviews-bot · 2025-11-06T13:24:27Z

🔔 1st Reminder

Hey @jkczyz @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-11-06T13:24:28Z

🔔 1st Reminder

Hey @jkczyz @tankyleo! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull requested review from jkczyz and tankyleo August 19, 2025 09:45

tnull changed the title ~~2025 08 enable client side delays~~ Enable client-side delays Aug 19, 2025

tnull changed the title ~~Enable client-side delays~~ Enable client-side timeouts Aug 19, 2025

tankyleo reviewed Aug 19, 2025

View reviewed changes

jkczyz reviewed Sep 8, 2025

View reviewed changes

tnull force-pushed the 2025-08-enable-client-side-delays branch from 5e83c2b to 5670866 Compare November 4, 2025 13:22

tnull changed the title ~~Enable client-side timeouts~~ Enable client-side timeouts and replace retry logic with reqwest's Nov 4, 2025

tnull requested review from jkczyz and tankyleo November 4, 2025 13:23

tnull force-pushed the 2025-08-enable-client-side-delays branch from 5670866 to 4e360cf Compare November 4, 2025 13:24

tnull force-pushed the 2025-08-enable-client-side-delays branch 2 times, most recently from 9c94320 to 9c3148f Compare November 5, 2025 12:24

tnull mentioned this pull request Nov 5, 2025

Drop custom retry policy #47

Closed

tnull force-pushed the 2025-08-enable-client-side-delays branch 4 times, most recently from f926914 to 8b8910d Compare November 5, 2025 13:16

tnull force-pushed the 2025-08-enable-client-side-delays branch from 6bc7bce to b258cec Compare November 5, 2025 14:04

tnull added 6 commits November 5, 2025 15:22

DRY up Client building

c2c7059

Re-export the reqwest crate

a25ed48

.. as some types are part of our API.

Expect NOT_FOUND / 404 status code in NoSuchKey error response

93b7381

Drop custom retry policy

e6d8e57

We drop our (~broken) `RetryPolicy`, and replace it with the new-ish retry policy that is shipping in `reqwest` since 0.12.23.

tnull force-pushed the 2025-08-enable-client-side-delays branch from b258cec to e6d8e57 Compare November 5, 2025 14:24

Add VssClient::from_client_and_headers constructor

7cf661b

Previously, we'd allow to either re-use a `reqwest::Client` or supply a header provider. Here we add a new constructor that allows us to do both at the same time.

Enable client-side timeouts and replace retry logic with reqwest's #39

Are you sure you want to change the base?

Enable client-side timeouts and replace retry logic with reqwest's #39

Conversation

tnull commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tankyleo Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

tankyleo commented Aug 19, 2025

Uh oh!

jkczyz commented Aug 20, 2025

Uh oh!

tnull commented Aug 20, 2025

Uh oh!

jkczyz commented Aug 20, 2025

Uh oh!

tnull commented Aug 21, 2025

Uh oh!

ldk-reviews-bot commented Aug 21, 2025

Uh oh!

jkczyz commented Aug 21, 2025

Uh oh!

tnull commented Aug 21, 2025

Uh oh!

tnull commented Aug 22, 2025

Uh oh!

jkczyz commented Aug 22, 2025

Uh oh!

ldk-reviews-bot commented Aug 23, 2025

Uh oh!

ldk-reviews-bot commented Aug 25, 2025

Uh oh!

ldk-reviews-bot commented Aug 27, 2025

Uh oh!

ldk-reviews-bot commented Aug 30, 2025

Uh oh!

ldk-reviews-bot commented Sep 1, 2025

Uh oh!

ldk-reviews-bot commented Sep 3, 2025

Uh oh!

ldk-reviews-bot commented Sep 6, 2025

Uh oh!

ldk-reviews-bot commented Sep 8, 2025

Uh oh!

jkczyz left a comment

Choose a reason for hiding this comment

Uh oh!

tnull commented Nov 4, 2025

Uh oh!

tnull commented Nov 5, 2025

Uh oh!

tnull commented Nov 5, 2025

Uh oh!

tnull commented Nov 5, 2025

Uh oh!

ldk-reviews-bot commented Nov 6, 2025

Uh oh!

ldk-reviews-bot commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enable client-side timeouts and replace retry logic with `reqwest`'s #39

Enable client-side timeouts and replace retry logic with `reqwest`'s #39

tnull commented Aug 19, 2025 •

edited

Loading

ldk-reviews-bot commented Aug 19, 2025 •

edited

Loading