-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Following the recent relay node infrastructure changes, users whose browser has a now-offline node cached in beacon:matrix-selected-node (localStorage) are permanently deadlocked on page load. The SDK attempts to reach the dead node with no timeout and no fallback to server discovery. The only recovery is manually clearing localStorage.
Octez.connect v4.8.1 removes decommissioned nodes from the default list, which prevents new connections from hitting dead servers. However, it does not address users who already have a dead node cached from a previous session. With 4 of 12 relay nodes taken offline, roughly a third of existing P2P users are affected.
Reproduction
- Connect a dApp to any P2P/Matrix wallet (e.g., Kukai)
- In DevTools, set
beacon:matrix-selected-nodein localStorage to an unreachable URL - Refresh the page
- The page hangs indefinitely
Originally reported by Klas Harrysson (Kukai). Confirmed to affect objkt.com and any dApp that auto-reconnects from cached session state.
Root cause
P2PCommunicationClient.getRelayServer() in beacon-transport-matrix:
getBeaconInfo()has no timeout on the axios call. A dead server hangs the request indefinitely.- When a stored node is read from
MATRIX_SELECTED_NODE,getBeaconInfo(node)is called without try/catch. On failure, the existingfindBestRegionAndGetServer()discovery path is never reached. - Same issue on the stale-timestamp refresh path when a cached
relayServerneeds to re-validate.
Fix
We've implemented and shipped a fix in our Beacon SDK patches fork (a fork we maintain for collecting fixes not yet merged upstream) and submitted it upstream:
- Upstream PR: airgap-it/beacon-sdk#965
- Commits:
9397dff(deadlock recovery),517771c(offline guard)
The changes:
- Add 10s timeout to
getBeaconInfo()axios call - Wrap stored-node checks in try/catch; on failure, delete stale node from storage and fall through to
findBestRegionAndGetServer() - Check
navigator.onLinebefore deleting a stored node (mobile devices in transient offline states shouldn't lose their pairing)
Both commits include tests. We've shipped these fixes in Taquito v24.1.0-beta.1 via a patched Beacon SDK build. We have not done extensive real-device testing across the full mobile matrix (Android versions, iOS, low power modes, idle persistence, etc.). We have the capability and tooling for that kind of testing but the resources aren't there for unplanned emergency work on a dependency we don't maintain.
We'd recommend porting these fixes and running them through your own test process before releasing.
Related
- ecadlabs/taquito#3332 - Bug report on the Taquito side
- airgap-it/beacon-sdk#965 - Upstream PR with the fix