-
Notifications
You must be signed in to change notification settings - Fork 0
RPC Resolver: If all are unhealthy, return the default RPC #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
PASS [ 44.694s] (3/3) lit_node::test toxiproxy::perf_tests::load_with_no_latency |
GTC6244
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
For a future PR, I'd suggest maybe doing away with "convention" for when things are not healthy, and add some sort of indicator to the RpcEntry itself ( like a default_public_endpoint : bool or something... )
|
Using default prio is mainly for effort conservation and backward compatibility because we can just assign 0 to currently existing RPCs, but I agree an explicit (and user-selectable) marker per chain is desirable |
|
RPC selection is still broken for nodes even with this change for some reason, it's still selecting the replica. |
scratch the above, i was observing a stale build - works as advertised. |
GTC6244
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - minor comment that may or may not be relevant to code.
Merge activity
|
79fe5ae to
a9c2cfa
Compare

What
Improved RPC fallback selection logic to choose default-priority (0) entries when no healthy RPCs are available.
New (not-yet polled) RPCs are now initialized as
Unhealthyinstead of Healthy with a fake latency based on order of definition.This change removes reliance on order of defining RPCs in rpc-config.yaml
followup to #73
Why
Some of our services (prov bootstrap) request RPCs before healthcheck poller has had a chance to run. We thus have no idea which RPC from the list to return and should prefer the one with the highest likelihood of being available.
Previously, new entries that have not yet been polled were initialized as Healthy, which according to the new selection method led to the replica IP being selected (highest priority) before it could be proven to be Unhealthy by the poller, which broke prov bootstrap.
The previous selection algorithm just selected by order of definition in this case (first RPC defined in the file) which is brittle and darkmagic-y as well.
This change ensures we use a more reliable default-priority remote RPC anytime we don't know which RPC is actually available, which is typically a public or private remote RPC to that chain and can be more safely assumed to be available.
This default (priority-0) RPC is now mandatory for every chain that might be used in such way and prio 0 now has a special meaning.
Background
During prov bootstrap, the healthcheck poller requests a yellowstone RPC immediately on boot without waiting for a healthcheck cycle. Previously, a replica URL was selected due to high prio because not-yet polled RPCs were initialized as Healthy, which made prov boot fail.