|
| 1 | +rules: |
| 2 | + - metadata: |
| 3 | + kind: prequel |
| 4 | + id: 5UD1RZxGC5LJQnVmAkV11B |
| 5 | + gen: 1 |
| 6 | + cre: |
| 7 | + id: CRE-2025-0108 |
| 8 | + severity: 1 |
| 9 | + title: "MongoDB Replica Set Primary Election Failure" |
| 10 | + category: "database-problem" |
| 11 | + author: Prequel |
| 12 | + description: | |
| 13 | + Detects high-severity MongoDB replica set primary election failures that result in no primary node being available, |
| 14 | + causing complete service unavailability. This rule targets catastrophic conditions that break replica set consensus: |
| 15 | + - Primary node failures followed by election timeouts where no secondary can become primary |
| 16 | + - Network partitions isolating replica set members and preventing quorum formation |
| 17 | + - Heartbeat failures and connectivity issues leading to election failures |
| 18 | + - Replica set state transitions indicating election problems |
| 19 | + cause: | |
| 20 | + - Primary node crashes or becomes unreachable due to hardware/network issues |
| 21 | + - Network partitions isolate replica set members, preventing quorum formation |
| 22 | + - Insufficient voting members available to elect a new primary (split-brain scenarios) |
| 23 | + - Election timeout settings too aggressive for network conditions |
| 24 | + - MongoDB configuration issues affecting election processes |
| 25 | + - System resource constraints (CPU, memory, disk) causing node failures |
| 26 | + - Firewall or security group rules blocking inter-node communication |
| 27 | + tags: |
| 28 | + - ha |
| 29 | + - quorum |
| 30 | + - leader-election |
| 31 | + - network |
| 32 | + - timeout |
| 33 | + - crash |
| 34 | + - data-loss |
| 35 | + mitigation: | |
| 36 | + PREVENTION: |
| 37 | + - Monitor replica set member health and network connectivity |
| 38 | + - Set appropriate election timeout values for network conditions |
| 39 | + - Ensure sufficient replica set members for quorum formation |
| 40 | + - Monitor resource usage (CPU, memory, disk) on all nodes |
| 41 | + RESPONSE: |
| 42 | + - Check replica set status: rs.status() |
| 43 | + - Restart failed replica set members |
| 44 | + - Reconnect isolated network segments |
| 45 | + - Force replica set reconfiguration if needed |
| 46 | + - Consider adding additional replica set members |
| 47 | + references: |
| 48 | + - https://docs.mongodb.com/manual/core/replica-set-elections/ |
| 49 | + - https://docs.mongodb.com/manual/tutorial/troubleshoot-replica-sets/ |
| 50 | + - https://docs.mongodb.com/manual/core/replica-set-high-availability/ |
| 51 | + applications: |
| 52 | + - name: mongodb |
| 53 | + impact: | |
| 54 | + - Complete write unavailability (no primary node) |
| 55 | + - Potential read issues depending on read preference settings |
| 56 | + - Application downtime and service disruption |
| 57 | + - Risk of data inconsistency in split-brain scenarios |
| 58 | + impactScore: 10 |
| 59 | + mitigationScore: 7 |
| 60 | + reports: 1 |
| 61 | + rule: |
| 62 | + set: |
| 63 | + event: |
| 64 | + source: cre.log.mongodb |
| 65 | + match: |
| 66 | + - regex: "No primary exists currently|PrimarySteppedDown: No primary exists currently|Failed to refresh query analysis configurations.*No primary exists currently|Starting an election, since we have seen no PRIMARY in election timeout period|election timeout period|Election timeout|ShutdownInProgress|In the process of shutting down|received an invalid response|heartbeat.*timeout|heartbeat.*failed|network.*partition|connection.*refused|connection.*timeout|HostUnreachable|Replica set state transition.*SECONDARY|Member is in new state.*SECONDARY|stepping up to primary|stepping down from primary" |
0 commit comments