-
Notifications
You must be signed in to change notification settings - Fork 16
BREAKING: Use RTT to optimize request timeouts #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor/blocking-socket
Are you sure you want to change the base?
Conversation
SeverinAlexB
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to prove first that this algorithm is useful in the context of a DHT. RTT estimates are great when you communicate with a single entity. In mainline, you communicate with a different node more or less every time you send a UDP packet.
I run your code and printed out the calculated request_timeout when using put_immutable. It fluctuates widely, has a even wave pattern in it (see image).
Please explain to me how this estimate should be a good fit for mainline.
| pub(crate) fn new(config: &Config) -> Result<Self, std::io::Error> { | ||
| let request_timeout = config.request_timeout; | ||
| let port = config.port; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request_timeout is still configurable. Is it used somewhere else? Should it be removed? Should the constant timeout time be kept for backward compatability?
| estimated_rtt: Duration::from_millis(INITIAL_ESTIMATED_RTT_MS), | ||
| deviation_rtt: Duration::from_millis(DEVIATION_RTT_MS), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract all RTT methods/variable in it's own struct so we have a clear separation between the inflight request struct and RTT?
This way, it would also be easier to write tests.
| /// Updates RTT estimates using exponentially weighted moving averages (EWMA). | ||
| /// - Estimated RTT = (1-α) * old_estimate + α * sample | ||
| /// - Deviation RTT = (1-β) * old_deviation + β * |sample - new_estimate| | ||
| /// | ||
| /// Conservative learning rates (α=0.05, β=0.1) make the algorithm less sensitive to | ||
| /// temporary network fluctuations for stable DHT timeout calculations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would likely be great to link a RTT guide so the next developer can easily familiarize themselves with the concept
https://how.dev/answers/how-to-compute-devrtt-estimated-rtt-time-out-interval-in-ccn
| /// Conservative learning rate for estimated RTT (lower = more stable, higher = faster adaptation) | ||
| const ALPHA: f64 = 0.05; | ||
| /// Conservative learning rate for RTT deviation (lower = more stable, higher = faster adaptation) | ||
| const BETA: f64 = 0.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling it ALPHA and BETA which nobody except for devs familiar with the RTT calculation actually understands, why not call it something more reasonable?
What comes to my mind: RTT_LEARNING_RATE_ALPHA, or RTT_LEARNING_RATE_ESTIMATE, or RTT_LEARNING_RATE_DEVIATION.
| } | ||
|
|
||
| fn request_timeout(&self) -> Duration { | ||
| let timeout = self.estimated_rtt + self.deviation_rtt.mul_f64(4.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.0 = Magic number
This PR introduces adaptive RTT estimator for inflight requests.
Benefits
How
RTT is updated with exponentially weighted moving averages. Timeout is calculated as estimated_rtt + 4 * deviation, clamped to a minimum (to enforce a safe baseline). This mechanism adapts over time to the actual network conditions.