-
Notifications
You must be signed in to change notification settings - Fork 169
Non-validator node stuck in bootstrap loop — P2P state transfer always times out (829MB, 60s timeout) #143
Copy link
Copy link
Open
Description
Problem
My non-validator node crashed with "too many blocks to request" panic and lost visor_abci_state.json. Now it's stuck in
an infinite bootstrap loop — every peer times out before completing the 829MB state transfer.
Environment
- Ubuntu 24.04 LTS, 64GB RAM, NVMe SSD
- EU-based server (94.x.x.x)
- Internet: >100 Mbps (not the bottleneck)
- hl-visor:
4a5921208e0c8d246d26aeaa3958557ee538d48b(2026-02-28) - hl-node:
105cb1dc0129b99e8a81d16b8cc4118749a0ed37(2026-03-21) try_new_peers: true, 63 peers configured
What happens
- Node connects to a peer and starts downloading
abci_state(829MB) - Transfer speed is ~1-1.3 MB/s from all peers
- After ~60 seconds →
tcp_read_exact: deadline has elapsed(downloaded only ~60MB of 829MB) - Next attempt → same peer says
Rate limited by peer - Moves to next peer → same timeout
- Loop continues indefinitely across 100+ peers
Logs
WARN >>> hl-node @@ considering local abci state as stale @@ [stale_reason: "local height is too far behind"]
WARN >>> hl-node @@ connecting to peer: Ip(x.x.x.x)
WARN >>> hl-node @@ reading bytes for abci_stream recv greeting: 52000000/829967179
WARN >>> hl-node @@ could not read abci state from x.x.x.x: lu::timeout tcp_read_exact: deadline has elapsed
WARN >>> hl-node @@ Rate limited by peer
What I've tried
- 63 different peers (EU, US, JP, SG) — all give ~1 MB/s, all timeout
- Helsinki Hetzner peers (65.109.x.x) — always
early eof, never serve state - Restoring from
periodic_abci_states(height 939600000) — node says "too far behind", forces full bootstrap override_gossip_config.jsonwithtry_new_peers: true— discovers 100+ peers, none fast enough- Running for 10+ hours — no successful bootstrap
The math
829MB ÷ 60s timeout = ~14 MB/s required. All peers serve at ~1 MB/s. This means bootstrap is physically impossible
from my location unless:
- A peer serves faster than 14 MB/s, or
- The timeout is increased, or
- State can be downloaded via HTTP/S3 instead of P2P
Questions
- Is there a way to download
abci_state.rmpdirectly (HTTP/S3 snapshot)? - Can the 60-second stream timeout be configured?
- Is there a plan to provide state snapshots for bootstrap (like other L1 chains do)?
Related issues
- [hl-node][Mainnet] Panic: too many blocks to request in gossip forward_client_blocks after starting bootstrap; visor restarts child #141 — same "too many blocks to request" panic
- Could not read abci state from peer ips and raise early eof error #73 — same "early eof" during bootstrap (resolved by moving to Tokyo)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels