Skip to content

reconnect loops are not cancellation-aware and can block shutdown #129

@maxcountryman

Description

@maxcountryman

Summary

Listener reconnect logic retries forever with backoff, but does not observe shutdown cancellation while reconnecting.

Why this is a problem

If PostgreSQL is unavailable, worker/scheduler/activity-worker can remain stuck in reconnect retries and ignore runtime shutdown requests until connectivity returns.

Evidence

  • src/queue.rs:2155 (retry_with_backoff loops forever)
  • src/worker.rs:587 (connect_listeners_with_retry awaited before entering shutdown-aware select)
  • src/scheduler.rs:371 (same pattern for scheduler listener)
  • src/activity_worker.rs:162 (same pattern for activity call listener)

Expected behavior

Shutdown cancellation should interrupt reconnect sleeps and exit loops promptly.

Proposed direction

  • Thread CancellationToken into retry helpers.
  • Use tokio::select! around sleep/retry boundaries to abort on cancellation.
  • Ensure all listener connect paths are covered.

Acceptance criteria

  • During DB outage, calling runtime shutdown exits worker/scheduler/activity worker in bounded time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions