Skip to content

Commit 35ae470

Browse files
committed
Add MongoDB primary election failure detection rule (CRE-2025-0108)
- Introduced a new rule to detect high-severity MongoDB replica set primary election failures that lead to service unavailability. - The rule includes detailed metadata, causes, impacts, and mitigation strategies. - Added a test log file to simulate various scenarios related to primary election failures in MongoDB.
1 parent 6dd45d4 commit 35ae470

File tree

2 files changed

+1700
-0
lines changed

2 files changed

+1700
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: 5UD1RZxGC5LJQnVmAkV11B
5+
gen: 1
6+
cre:
7+
id: CRE-2025-0108
8+
severity: 1
9+
title: "MongoDB Replica Set Primary Election Failure"
10+
category: "mongodb-ha"
11+
author: Prequel
12+
description: |
13+
Detects high-severity MongoDB replica set primary election failures that result in no primary node being available,
14+
causing complete service unavailability. This rule targets catastrophic conditions that break replica set consensus:
15+
- Primary node failures followed by election timeouts where no secondary can become primary
16+
- Network partitions isolating replica set members and preventing quorum formation
17+
- Heartbeat failures and connectivity issues leading to election failures
18+
- Replica set state transitions indicating election problems
19+
cause: |
20+
- Primary node crashes or becomes unreachable due to hardware/network issues
21+
- Network partitions isolate replica set members, preventing quorum formation
22+
- Insufficient voting members available to elect a new primary (split-brain scenarios)
23+
- Election timeout settings too aggressive for network conditions
24+
- MongoDB configuration issues affecting election processes
25+
- System resource constraints (CPU, memory, disk) causing node failures
26+
- Firewall or security group rules blocking inter-node communication
27+
tags:
28+
- mongodb
29+
- replica-set
30+
- primary-election
31+
- availability
32+
- database
33+
- ha
34+
- quorum
35+
- heartbeat
36+
- network-partition
37+
- election-timeout
38+
- crash
39+
- data-loss
40+
mitigation: |
41+
PREVENTION:
42+
- Monitor replica set member health and network connectivity
43+
- Set appropriate election timeout values for network conditions
44+
- Ensure sufficient replica set members for quorum formation
45+
- Monitor resource usage (CPU, memory, disk) on all nodes
46+
RESPONSE:
47+
- Check replica set status: rs.status()
48+
- Restart failed replica set members
49+
- Reconnect isolated network segments
50+
- Force replica set reconfiguration if needed
51+
- Consider adding additional replica set members
52+
references:
53+
- https://docs.mongodb.com/manual/core/replica-set-elections/
54+
- https://docs.mongodb.com/manual/tutorial/troubleshoot-replica-sets/
55+
- https://docs.mongodb.com/manual/core/replica-set-high-availability/
56+
applications:
57+
- name: mongodb
58+
impact: |
59+
- Complete write unavailability (no primary node)
60+
- Potential read issues depending on read preference settings
61+
- Application downtime and service disruption
62+
- Risk of data inconsistency in split-brain scenarios
63+
impactScore: 10
64+
mitigationScore: 7
65+
reports: 1
66+
rule:
67+
set:
68+
event:
69+
source: cre.log.mongodb
70+
match:
71+
- regex: "No primary exists currently|PrimarySteppedDown: No primary exists currently|Failed to refresh query analysis configurations.*No primary exists currently|Starting an election, since we have seen no PRIMARY in election timeout period|election timeout period|Election timeout|ShutdownInProgress|In the process of shutting down|received an invalid response|heartbeat.*timeout|heartbeat.*failed|network.*partition|connection.*refused|connection.*timeout|HostUnreachable|Replica set state transition.*SECONDARY|Member is in new state.*SECONDARY|stepping up to primary|stepping down from primary"

0 commit comments

Comments
 (0)