Description:
api/chain/notification services should not crash or become unavailable if NATS is down or unreachable. They must continue to function, and a backup solution for fan-out messaging (e.g., TCP-based or similar) should be researched and implemented so that notifications can still be delivered when NATS is unavailable.
Requirements:
- Update NATS client usage so that:
- Service startup does not crash if NATS cannot be reached.
- Connection loss triggers graceful degradation instead of process exit (e.g., stop publishing to NATS, but keep HTTP/API endpoints up).
- Implement reconnection and health-handling logic:
- Use automatic reconnection with backoff and unlimited retries where appropriate.
- Log NATS connection errors and status changes clearly for operations.
- Research and design a backup fan-out mechanism when NATS is unavailable:
- Options could include direct TCP connections, database-backed outbox pattern, or another message transport that can be enabled as a fallback.
- Define how messages are queued and delivered when NATS returns.
- Define behavior for fan-out when NATS is down:
- How to avoid message loss.
- How to avoid duplicate delivery when NATS comes back and both paths might send.
Acceptance criteria:
api/chain/notification services remain up and responsive when NATS is unavailable (no crashes or failed container restarts).
- NATS outages are handled via reconnection logic and clear logs, without blocking core APIs.
- A documented backup fan-out design is agreed upon and an initial implementation is in place (or a concrete plan exists if phased).
- Fan-out behavior is verified under simulated NATS outage (e.g., stopping the NATS cluster) and recovery, with no data loss and no service crashes.
Description:
api/chain/notificationservices should not crash or become unavailable if NATS is down or unreachable. They must continue to function, and a backup solution for fan-out messaging (e.g., TCP-based or similar) should be researched and implemented so that notifications can still be delivered when NATS is unavailable.Requirements:
Acceptance criteria:
api/chain/notificationservices remain up and responsive when NATS is unavailable (no crashes or failed container restarts).