Skip to content

Commit 95fc5b5

Browse files
dhvlltonymeehan
andauthored
Add NATS connection failure detection rule and associated test log (#78)
* Add NATS connection failure detection rule and associated test log - Introduced a new rule for detecting NATS connection failures and network partitions, including detailed metadata, causes, impacts, and mitigation strategies. - Added a test log file containing various error messages related to NATS connectivity issues. - Updated tags.yaml to include relevant tags for NATS and connectivity issues. * fix few changes * Update severity level for NATS connection failure rule and remove timestamp format * fix merge issue * fix space * conflict issue * fix err * added category * remove high security category --------- Co-authored-by: Tony Meehan <[email protected]>
1 parent e887286 commit 95fc5b5

File tree

3 files changed

+305
-12
lines changed

3 files changed

+305
-12
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: 43LNwPunkRCSovrjPyoxpukWVtnU
5+
gen: 1
6+
cre:
7+
id: CRE-2025-0103
8+
severity: 2
9+
title: NATS Connection Failures and Network Partitions
10+
category: message-queue-problem
11+
tags:
12+
- nats
13+
- connectivity
14+
author: Prequel
15+
description: |
16+
Detects NATS connection failures and network partitions that can impact message delivery and system reliability.
17+
cause: |
18+
- Network connectivity issues between NATS clients and servers
19+
- NATS server crashes or restarts
20+
- Network partitions causing client disconnections
21+
- Connection timeouts due to network latency or server overload
22+
impact: |
23+
- Message delivery failures
24+
- Service disruptions
25+
- Increased latency
26+
- System instability
27+
mitigation: |
28+
IMMEDIATE ACTIONS:
29+
1. Check NATS server health and logs
30+
2. Verify network connectivity between clients and servers
31+
3. Check for network partition events
32+
4. Monitor system resources
33+
34+
RECOVERY:
35+
1. Restore network connectivity if partitioned
36+
2. Restart affected NATS clients
37+
3. Verify message delivery resumes
38+
4. Monitor reconnection attempts
39+
40+
PREVENTION:
41+
1. Implement proper monitoring and alerting
42+
2. Use redundant NATS servers
43+
3. Configure appropriate timeouts and retry policies
44+
4. Regular network health checks
45+
references:
46+
- https://docs.nats.io/running-a-nats-service/configuration
47+
- https://docs.nats.io/running-a-nats-service/configuration/sys_accounts
48+
applications:
49+
- name: "nats"
50+
version: ">=2.0.0"
51+
rule:
52+
sequence:
53+
window: 30s
54+
event:
55+
source: cre.log.nats
56+
order:
57+
- regex: ".*ERROR connection failed: (nats: connection closed|NATS server unreachable.*)"
58+
count: 5
59+
- regex: ".*ERROR NATS client disconnected.*"

0 commit comments

Comments
 (0)