RPC Resolver: If all are unhealthy, return the default RPC #77

Garandor · 2026-01-15T06:25:18Z

What

Improved RPC fallback selection logic to choose default-priority (0) entries when no healthy RPCs are available.

New (not-yet polled) RPCs are now initialized as Unhealthy instead of Healthy with a fake latency based on order of definition.

This change removes reliance on order of defining RPCs in rpc-config.yaml

followup to #73

Why

Some of our services (prov bootstrap) request RPCs before healthcheck poller has had a chance to run. We thus have no idea which RPC from the list to return and should prefer the one with the highest likelihood of being available.

Previously, new entries that have not yet been polled were initialized as Healthy, which according to the new selection method led to the replica IP being selected (highest priority) before it could be proven to be Unhealthy by the poller, which broke prov bootstrap.
The previous selection algorithm just selected by order of definition in this case (first RPC defined in the file) which is brittle and darkmagic-y as well.

This change ensures we use a more reliable default-priority remote RPC anytime we don't know which RPC is actually available, which is typically a public or private remote RPC to that chain and can be more safely assumed to be available.

This default (priority-0) RPC is now mandatory for every chain that might be used in such way and prio 0 now has a special meaning.

Background

During prov bootstrap, the healthcheck poller requests a yellowstone RPC immediately on boot without waiting for a healthcheck cycle. Previously, a replica URL was selected due to high prio because not-yet polled RPCs were initialized as Healthy, which made prov boot fail.

Garandor · 2026-01-15T06:25:34Z

RPC Resolver: Healthcheck always on #78
RPC Resolver: If all are unhealthy, return the default RPC #77 👈 (View in Graphite)
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-01-15T06:37:09Z

PASS [ 44.694s] (3/3) lit_node::test toxiproxy::perf_tests::load_with_no_latency
PASS [ 44.783s] (2/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_single_link
PASS [ 91.215s] (1/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_all_links

GTC6244

lgtm.

For a future PR, I'd suggest maybe doing away with "convention" for when things are not healthy, and add some sort of indicator to the RpcEntry itself ( like a default_public_endpoint : bool or something... )

Garandor · 2026-01-15T17:43:30Z

Using default prio is mainly for effort conservation and backward compatibility because we can just assign 0 to currently existing RPCs, but I agree an explicit (and user-selectable) marker per chain is desirable

Garandor · 2026-01-15T19:23:39Z

RPC selection is still broken for nodes even with this change for some reason, it's still selecting the replica.
Investigating some more

lit os guest instance create node \\\n--net4-ip 23.105.38.175/26 \\\n--net4-gw 23.105.38.190 \\\n--subnet-id 149a054CE79A379Ae5E97f5B984B993233b28061 \\\n--node-staker-address 0x1A3596441024a3CA8b33AD996e50c492aFd42fC7 \\\n--node-admin-address 0x4C06111c11556284cA3A9660Eae340c6485C2BAD \\\n--vcpus 16 \\\n--mem 20G\" /dev/null\n", "delta": "0:00:00.054469", "end": "2026-01-15 14:10:11.086831", "finished": true, "msg": "non-zero return code", "rc": 101, "results_file": "/root/.ansible_async/j838247541378.4112", "start": "2026-01-15 14:10:11.032362", "started": true, "stderr": "", "stderr_lines": [], "stdout": "\r\nthread 'main' panicked at /opt/assets/lit-assets/rust/lit-os/lit-cli-os/src/guest/instance/release.rs:74:55:\r\nfailed to construct ProvApiClient: lit_blockchain::Error { kind: Blockchain, msg: \"failed to call contract_resolver on staking contract (subnet id: 149a054ce79a379ae5e97f5b984b993233b28061, contract address: 0x149a…8061, chain_id: 175188, chain_name: yellowstone)\", source: MiddlewareError { e: HTTPError(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv4(172.30.0.1)), port: Some(8547), path: \"/\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 111, kind: ConnectionRefused, message: \"Connection refused\" })) }) }, caller:  { file: \"/opt/assets/lit-assets/rust/lit-core/lit-blockchain/src/resolver/contract/mod.rs:189:26\" } }\r\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace", "stdout_lines": ["", "thread 'main' panicked at /opt/assets/lit-assets/rust/lit-os/lit-cli-os/src/guest/instance/release.rs:74:55:", "failed to construct ProvApiClient: lit_blockchain::Error { kind: Blockchain, msg: \"failed to call contract_resolver on staking contract (subnet id: 149a054ce79a379ae5e97f5b984b993233b28061, contract address: 0x149a…8061, chain_id: 175188, chain_name: yellowstone)\", source: MiddlewareError { e: HTTPError(reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Ipv4(172.30.0.1)), port: Some(8547), path: \"/\", query: None, fragment: None }, source: hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 111, kind: ConnectionRefused, message: \"Connection refused\" })) }) }

Garandor · 2026-01-16T03:27:57Z

RPC selection is still broken for nodes even with this change for some reason, it's still selecting the replica. Investigating some more

scratch the above, i was observing a stale build - works as advertised.

GTC6244

lgtm - minor comment that may or may not be relevant to code.

rust/lit-core/lit-blockchain/src/resolver/rpc/mod.rs

Garandor · 2026-01-22T13:43:58Z

Merge activity

Jan 22, 1:43 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Jan 22, 1:44 PM UTC: Graphite rebased this pull request as part of a merge.
Jan 22, 2:26 PM UTC: Graphite couldn't merge this PR because it failed for an unknown reason (GitHub threw an unexpected error that did not resolve after multiple retries. Please try again later or contact Graphite support if this continues.).
Jan 22, 3:21 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Jan 22, 3:22 PM UTC: @Garandor merged this pull request with Graphite.

Garandor requested review from GTC6244, glitch003 and kapoorabhishek24 January 15, 2026 06:25

Garandor marked this pull request as ready for review January 15, 2026 06:25

GTC6244 approved these changes Jan 15, 2026

View reviewed changes

Garandor mentioned this pull request Jan 16, 2026

RPC Resolver: Healthcheck always on #78

Merged

Garandor requested review from DashKash54 and GTC6244 January 21, 2026 05:59

Garandor changed the title ~~If all are unhealthy, return the default RPC~~ RPC Resolver: If all are unhealthy, return the default RPC Jan 21, 2026

GTC6244 approved these changes Jan 21, 2026

View reviewed changes

rust/lit-core/lit-blockchain/src/resolver/rpc/mod.rs Show resolved Hide resolved

Garandor added 4 commits January 22, 2026 13:44

If all are unhealthy, return the default RPC

6acaeac

Initialize RPCs as Unhealthy

c447f1c

fmt

825fc16

cleanup unused code

a9c2cfa

Garandor force-pushed the rework_default_rpc branch from 79fe5ae to a9c2cfa Compare January 22, 2026 13:44

Garandor merged commit 09fcd78 into master Jan 22, 2026
85 of 95 checks passed

Garandor self-assigned this Jan 22, 2026

Garandor deleted the rework_default_rpc branch January 22, 2026 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC Resolver: If all are unhealthy, return the default RPC #77

RPC Resolver: If all are unhealthy, return the default RPC #77

Uh oh!

Garandor commented Jan 15, 2026 •

edited

Loading

Uh oh!

Garandor commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

GTC6244 left a comment

Uh oh!

Garandor commented Jan 15, 2026

Uh oh!

Garandor commented Jan 15, 2026

Uh oh!

Garandor commented Jan 16, 2026

Uh oh!

GTC6244 left a comment

Uh oh!

Uh oh!

Garandor commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RPC Resolver: If all are unhealthy, return the default RPC #77

RPC Resolver: If all are unhealthy, return the default RPC #77

Uh oh!

Conversation

Garandor commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Background

Uh oh!

Garandor commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GTC6244 left a comment

Choose a reason for hiding this comment

Uh oh!

Garandor commented Jan 15, 2026

Uh oh!

Garandor commented Jan 15, 2026

Uh oh!

Garandor commented Jan 16, 2026

Uh oh!

GTC6244 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Garandor commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Garandor commented Jan 15, 2026 •

edited

Loading

Garandor commented Jan 15, 2026 •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading

Garandor commented Jan 22, 2026 •

edited

Loading