Skip to content

[feature]: Add backoff retry for authmailbox subscription handshake #1967

@ffranr

Description

@ffranr

The authmailbox client retries initial connection establishment, but does not retry the subscription/authentication handshake. Add a backoff retry loop around the subscription handshake so transient auth/stream failures don’t permanently fail StartAccountSubscription.

Background / context

  • The connection retry logic exists in authmailbox/receive_subscription.go (connectServerStream uses MinBackoff/MaxBackoff and MaxConnectAttempts around MailboxInfo).
  • The subscription/auth handshake happens in authmailbox/receive_subscription.go (connectAndAuthenticate sends InitReceive, waits for challenge, sends AuthSig, then waits for AuthSuccess).
  • Proof couriers already implement backoff-retry for delivery/receive in proof/courier.go (BackoffHandler.Exec) and the generic retry helper lives in fn/retry.go.

Problem

If the subscription/auth handshake fails (e.g. stream error, server closes due to auth timeout, transient RPC error), connectAndAuthenticate returns an error and the subscription attempt stops without retrying. This makes startup/subscribe brittle even when the server is reachable and would succeed on a subsequent attempt.

Observed flow (client):

  1. connectServerStream succeeds.
  2. Client sends InitReceive.
  3. Handshake fails before AuthSuccess (timeout, stream error, etc.).
  4. StartAccountSubscription returns error; no backoff retry is attempted.

Proposed change

Add an exponential backoff retry loop around the auth subscription step.

High-level behavior:

  • On handshake failure, close/cancel the stream, wait with backoff, then retry the whole subscription handshake.
  • Respect context cancellation and client shutdown.
  • Bound attempts using existing config (or add a dedicated auth retry config if necessary).

Implementation sketch

Suggested approach (minimal config changes):

  • Extend receiveSubscription.connectAndAuthenticate to loop for up to MaxConnectAttempts attempts (or a new MaxAuthAttempts if separation is preferred).
  • For each attempt:
    • Call connectServerStream (keep its own connection retry/backoff).
    • Perform the auth handshake (InitReceive -> wait for AuthSuccess).
    • If handshake fails:
      • Call closeStream (or cancel stream ctx) so serverStream is nil and the read goroutine can exit.
      • Backoff (start at MinBackoff, double to MaxBackoff).
      • Retry unless context is done or client is shutting down.
  • Consider using fn.RetryFuncN for the retry loop, or keep local backoff logic like connectServerStream.
  • Ensure authOkChan and errChan don’t leak state across attempts (e.g., drain/reset per attempt or re-create per attempt).
  • Optional: add a client-side auth timeout (<= server AuthTimeout) to avoid indefinite waits if the stream stays open but no challenge arrives.

Files to touch

  • authmailbox/receive_subscription.go (main change: add handshake retry/backoff)
  • authmailbox/client.go (if config changes or helper methods are added)
  • authmailbox/mock.go / authmailbox/client_test.go (tests)
  • fn/retry.go (only if choosing to reuse RetryFuncN and need config/plumbing)

Test plan

Add/extend tests in authmailbox/client_test.go or a new test file:

  • Simulate a server that fails the first auth handshake (e.g., reject signature or force auth timeout) and then succeeds; verify client eventually becomes subscribed.
  • Verify that transient handshake failures trigger retries with backoff (use short backoff in tests).
  • Ensure IsSubscribed stays false until AuthSuccess is received, and becomes true after a successful retry.
  • Confirm Stop() cancels any pending retry loop without leaks.

Acceptance criteria

  • Subscription/auth handshake failures are retried with exponential backoff.
  • Retries stop on context cancel or client shutdown.
  • Stream is cleaned up between attempts (no goroutine leaks; IsSubscribed reflects true only on successful auth).
  • Existing connection retry and server-restart reconnect logic continue to work.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions