|
| 1 | +rules: |
| 2 | + - metadata: |
| 3 | + kind: prequel |
| 4 | + id: 43LNwPunkRCSovrjPyoxpukWVtnU |
| 5 | + gen: 1 |
| 6 | + cre: |
| 7 | + id: CRE-2025-0103 |
| 8 | + severity: 2 |
| 9 | + title: NATS Connection Failures and Network Partitions |
| 10 | + category: message-queue-problem |
| 11 | + tags: |
| 12 | + - nats |
| 13 | + - connectivity |
| 14 | + author: Prequel |
| 15 | + description: | |
| 16 | + Detects NATS connection failures and network partitions that can impact message delivery and system reliability. |
| 17 | + cause: | |
| 18 | + - Network connectivity issues between NATS clients and servers |
| 19 | + - NATS server crashes or restarts |
| 20 | + - Network partitions causing client disconnections |
| 21 | + - Connection timeouts due to network latency or server overload |
| 22 | + impact: | |
| 23 | + - Message delivery failures |
| 24 | + - Service disruptions |
| 25 | + - Increased latency |
| 26 | + - System instability |
| 27 | + mitigation: | |
| 28 | + IMMEDIATE ACTIONS: |
| 29 | + 1. Check NATS server health and logs |
| 30 | + 2. Verify network connectivity between clients and servers |
| 31 | + 3. Check for network partition events |
| 32 | + 4. Monitor system resources |
| 33 | +
|
| 34 | + RECOVERY: |
| 35 | + 1. Restore network connectivity if partitioned |
| 36 | + 2. Restart affected NATS clients |
| 37 | + 3. Verify message delivery resumes |
| 38 | + 4. Monitor reconnection attempts |
| 39 | +
|
| 40 | + PREVENTION: |
| 41 | + 1. Implement proper monitoring and alerting |
| 42 | + 2. Use redundant NATS servers |
| 43 | + 3. Configure appropriate timeouts and retry policies |
| 44 | + 4. Regular network health checks |
| 45 | + references: |
| 46 | + - https://docs.nats.io/running-a-nats-service/configuration |
| 47 | + - https://docs.nats.io/running-a-nats-service/configuration/sys_accounts |
| 48 | + applications: |
| 49 | + - name: "nats" |
| 50 | + version: ">=2.0.0" |
| 51 | + rule: |
| 52 | + sequence: |
| 53 | + window: 30s |
| 54 | + event: |
| 55 | + source: cre.log.nats |
| 56 | + order: |
| 57 | + - regex: ".*ERROR connection failed: (nats: connection closed|NATS server unreachable.*)" |
| 58 | + count: 5 |
| 59 | + - regex: ".*ERROR NATS client disconnected.*" |
0 commit comments