Skip to content

Conversation

@afterburn
Copy link
Contributor

This PR introduces adaptive RTT estimator for inflight requests.

Benefits

  1. Fewer false timeouts, timeouts are based on measured round-trip times instead of a fixed value, so peers with higher latency aren’t dropped unnecessarily.
  2. Faster recovery from real packet loss, when the network is healthy, the timeout shrinks, allowing retries sooner if a request truly failed.
  3. Stablility under jitter, conservative learning rates smooth out spikes, so one slow response doesn’t cause large swings in timeout.

How
RTT is updated with exponentially weighted moving averages. Timeout is calculated as estimated_rtt + 4 * deviation, clamped to a minimum (to enforce a safe baseline). This mechanism adapts over time to the actual network conditions.

@SeverinAlexB SeverinAlexB changed the title Use RTT to optimize request timeouts BREAKING: Use RTT to optimize request timeouts Sep 8, 2025
Copy link
Contributor

@SeverinAlexB SeverinAlexB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to prove first that this algorithm is useful in the context of a DHT. RTT estimates are great when you communicate with a single entity. In mainline, you communicate with a different node more or less every time you send a UDP packet.

I run your code and printed out the calculated request_timeout when using put_immutable. It fluctuates widely, has a even wave pattern in it (see image).

CleanShot 2025-09-08 at 14 21 20@2x

Please explain to me how this estimate should be a good fit for mainline.

Comment on lines 42 to 43
pub(crate) fn new(config: &Config) -> Result<Self, std::io::Error> {
let request_timeout = config.request_timeout;
let port = config.port;
Copy link
Contributor

@SeverinAlexB SeverinAlexB Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request_timeout is still configurable. Is it used somewhere else? Should it be removed? Should the constant timeout time be kept for backward compatability?

Comment on lines +45 to +46
estimated_rtt: Duration::from_millis(INITIAL_ESTIMATED_RTT_MS),
deviation_rtt: Duration::from_millis(DEVIATION_RTT_MS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract all RTT methods/variable in it's own struct so we have a clear separation between the inflight request struct and RTT?

This way, it would also be easier to write tests.

Comment on lines +101 to +106
/// Updates RTT estimates using exponentially weighted moving averages (EWMA).
/// - Estimated RTT = (1-α) * old_estimate + α * sample
/// - Deviation RTT = (1-β) * old_deviation + β * |sample - new_estimate|
///
/// Conservative learning rates (α=0.05, β=0.1) make the algorithm less sensitive to
/// temporary network fluctuations for stable DHT timeout calculations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would likely be great to link a RTT guide so the next developer can easily familiarize themselves with the concept
https://how.dev/answers/how-to-compute-devrtt-estimated-rtt-time-out-interval-in-ccn

Comment on lines +8 to +11
/// Conservative learning rate for estimated RTT (lower = more stable, higher = faster adaptation)
const ALPHA: f64 = 0.05;
/// Conservative learning rate for RTT deviation (lower = more stable, higher = faster adaptation)
const BETA: f64 = 0.1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling it ALPHA and BETA which nobody except for devs familiar with the RTT calculation actually understands, why not call it something more reasonable?

What comes to my mind: RTT_LEARNING_RATE_ALPHA, or RTT_LEARNING_RATE_ESTIMATE, or RTT_LEARNING_RATE_DEVIATION.

}

fn request_timeout(&self) -> Duration {
let timeout = self.estimated_rtt + self.deviation_rtt.mul_f64(4.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.0 = Magic number

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants