Skip to content

Users with cached dead relay node are permanently stuck (timeout + fallback missing in P2PCommunicationClient) #12

@jevonearth

Description

@jevonearth

Summary

Following the recent relay node infrastructure changes, users whose browser has a now-offline node cached in beacon:matrix-selected-node (localStorage) are permanently deadlocked on page load. The SDK attempts to reach the dead node with no timeout and no fallback to server discovery. The only recovery is manually clearing localStorage.

Octez.connect v4.8.1 removes decommissioned nodes from the default list, which prevents new connections from hitting dead servers. However, it does not address users who already have a dead node cached from a previous session. With 4 of 12 relay nodes taken offline, roughly a third of existing P2P users are affected.

Reproduction

  1. Connect a dApp to any P2P/Matrix wallet (e.g., Kukai)
  2. In DevTools, set beacon:matrix-selected-node in localStorage to an unreachable URL
  3. Refresh the page
  4. The page hangs indefinitely

Originally reported by Klas Harrysson (Kukai). Confirmed to affect objkt.com and any dApp that auto-reconnects from cached session state.

Root cause

P2PCommunicationClient.getRelayServer() in beacon-transport-matrix:

  1. getBeaconInfo() has no timeout on the axios call. A dead server hangs the request indefinitely.
  2. When a stored node is read from MATRIX_SELECTED_NODE, getBeaconInfo(node) is called without try/catch. On failure, the existing findBestRegionAndGetServer() discovery path is never reached.
  3. Same issue on the stale-timestamp refresh path when a cached relayServer needs to re-validate.

Fix

We've implemented and shipped a fix in our Beacon SDK patches fork (a fork we maintain for collecting fixes not yet merged upstream) and submitted it upstream:

The changes:

  1. Add 10s timeout to getBeaconInfo() axios call
  2. Wrap stored-node checks in try/catch; on failure, delete stale node from storage and fall through to findBestRegionAndGetServer()
  3. Check navigator.onLine before deleting a stored node (mobile devices in transient offline states shouldn't lose their pairing)

Both commits include tests. We've shipped these fixes in Taquito v24.1.0-beta.1 via a patched Beacon SDK build. We have not done extensive real-device testing across the full mobile matrix (Android versions, iOS, low power modes, idle persistence, etc.). We have the capability and tooling for that kind of testing but the resources aren't there for unplanned emergency work on a dependency we don't maintain.

We'd recommend porting these fixes and running them through your own test process before releasing.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions