Skip to content

Commit d53a7bd

Browse files
authored
Redis Comprehensive Troubleshooting CRE - GitHub Issue #132 (#133)
* Add comprehensive Redis troubleshooting rules and log examples * Add comprehensive Redis troubleshooting rules and log examples * Add individual Redis CRE rules split from comprehensive troubleshooting rule Split CRE-2025-0135 comprehensive Redis troubleshooting rule into 9 specific rules addressing GitHub issue #132. All rules configured with critical severity (0). New CRE rules added: - CRE-2025-0136: Redis OOM Errors - Maxmemory limit exceeded - CRE-2025-0173: Redis Connection Timeout - Network connectivity issues - CRE-2025-0174: Redis Authentication Failures - Password/ACL denials - CRE-2025-0175: Redis Master-Replica Sync Failure - Replication issues - CRE-2025-0176: Redis Persistence Failures - MISCONF disk write errors - CRE-2025-0177: Redis Slow Query Performance - Latency degradation - CRE-2025-0178: Redis Read-Only Replica Writes - Incorrect client routing - CRE-2025-0179: Redis Client Connection Limit - Max clients exceeded - CRE-2025-0180: Redis AOF Corruption - Recovery failures Each rule includes comprehensive troubleshooting guidance, test cases, and proper regex patterns for detection. All rules tested and validated with preq CLI. * merge conflict fixed * merge conflict fixed * solved the conflict * solveds * fix
1 parent 9864baf commit d53a7bd

22 files changed

+1127
-103
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
rules:
2+
- cre:
3+
id: CRE-2025-0173
4+
severity: 0
5+
title: "Redis Connection Timeout and Connectivity Issues"
6+
category: "in-memory-database-problem"
7+
author: Prequel Community
8+
description: |
9+
Detects Redis connection timeout errors and connectivity failures that prevent clients from establishing or maintaining connections to the Redis server. These issues commonly occur during high load, network problems, or server resource exhaustion.
10+
cause: |
11+
- Network latency or packet loss between client and Redis server
12+
- Redis server CPU overload causing slow response times
13+
- Client connection pool exhaustion or misconfiguration
14+
- Firewall or security group blocking connections
15+
- Redis server reached max clients limit
16+
- DNS resolution failures
17+
- Redis server process crashed or unresponsive
18+
impact: |
19+
- Application unable to read/write cache data
20+
- Increased latency for user requests
21+
- Potential data inconsistency if writes fail silently
22+
- Backend database overload due to cache unavailability
23+
- Service degradation or complete outage
24+
- Connection pool exhaustion leading to thread blocking
25+
impactScore: 10
26+
tags:
27+
- redis
28+
- connection
29+
- timeout
30+
- connectivity
31+
- network
32+
mitigation: |
33+
IMMEDIATE ACTIONS:
34+
- Verify Redis server is running: `systemctl status redis`
35+
- Test connectivity: `redis-cli -h <host> -p <port> ping`
36+
- Check current connections: `redis-cli CLIENT LIST | wc -l`
37+
- Review max clients setting: `redis-cli CONFIG GET maxclients`
38+
39+
RECOVERY:
40+
- Restart Redis service if unresponsive:
41+
`systemctl restart redis`
42+
- Increase connection timeout in client:
43+
`redis.conf: timeout 300`
44+
- Kill idle connections:
45+
`redis-cli CLIENT KILL TYPE normal`
46+
- Increase max clients limit:
47+
`redis-cli CONFIG SET maxclients 10000`
48+
49+
NETWORK TROUBLESHOOTING:
50+
- Check firewall rules: `iptables -L -n`
51+
- Test network connectivity: `telnet redis-host 6379`
52+
- Verify DNS resolution: `nslookup redis-host`
53+
- Check for packet loss: `ping -c 100 redis-host`
54+
55+
PREVENTION:
56+
- Implement connection pooling with proper sizing
57+
- Configure appropriate timeout values
58+
- Monitor connection metrics and set alerts
59+
- Use Redis Sentinel or Cluster for high availability
60+
- Implement circuit breaker pattern in clients
61+
- Regular load testing and capacity planning
62+
mitigationScore: 7
63+
references:
64+
- https://redis.io/docs/latest/operate/oss_and_stack/management/troubleshooting/#latency-issues
65+
- https://redis.io/commands/client-list/
66+
- https://redis.io/docs/latest/develop/clients/
67+
applications:
68+
- name: redis
69+
version: "*"
70+
- name: redis-cli
71+
version: "*"
72+
reports: 89
73+
metadata:
74+
kind: prequel
75+
id: Hf8NpQr4VxKmLw9TbYaZe6
76+
gen: 1
77+
rule:
78+
set:
79+
window: 180s
80+
event:
81+
source: cre.log.redis
82+
match:
83+
- regex: "Connection timeout"
84+
- regex: "Unable to connect to Redis"
85+
- regex: "Could not connect to Redis"
86+
- regex: "redis connection timeout"
87+
- regex: "Connection pool.*exhausted"

rules/cre-2025-0173/test.log

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[2024-01-15 11:00:01,123] ERROR [RedisClient] Connection timeout errors
2+
Connection timeout while connecting to redis server at 192.168.1.100:6379
3+
Unable to connect to Redis server
4+
Could not connect to Redis at localhost:6379: Connection refused
5+
redis connection timeout
6+
Timeout connecting to redis://cache.example.com:6379
7+
Failed to connect to Redis
8+
Redis is not reachable
9+
Connection pool exhausted for redis server
10+
Connection reset by peer while communicating with redis server
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: Bx5MnWq8TdRpLk3YfNvGa7
5+
hash: Jk9Pf4XsNmRw2QbVtHeLy6
6+
cre:
7+
id: CRE-2025-0174
8+
severity: 0
9+
title: "Redis Authentication Failures and ACL Permission Denials"
10+
category: "in-memory-database-problem"
11+
author: Prequel Community
12+
description: |
13+
Detects Redis authentication failures including wrong passwords, missing authentication, and ACL permission denials. These errors prevent legitimate clients from accessing Redis and may indicate security misconfigurations or attempted unauthorized access.
14+
cause: |
15+
- Incorrect password provided by client
16+
- Redis requirepass configured but client not sending auth
17+
- ACL user lacks required permissions for commands
18+
- Password rotation without updating client configs
19+
- Expired or disabled ACL user accounts
20+
- Misconfigured Redis AUTH settings
21+
impact: |
22+
- Complete inability to access Redis cache/data
23+
- Application features dependent on Redis fail
24+
- Service outages if Redis is critical infrastructure
25+
- Security risk if authentication is bypassed
26+
- Potential data exposure if misconfigured
27+
tags:
28+
- redis
29+
- authentication
30+
- security
31+
- acl
32+
- wrongpass
33+
mitigation: |
34+
IMMEDIATE ACTIONS:
35+
- Verify Redis auth configuration: `redis-cli CONFIG GET requirepass`
36+
- Test authentication: `redis-cli -a <password> ping`
37+
- Check ACL users: `redis-cli ACL LIST`
38+
- Review client connection strings for correct credentials
39+
40+
RECOVERY:
41+
- Update client password configuration
42+
- Reset Redis password if needed:
43+
`redis-cli CONFIG SET requirepass newpassword`
44+
- Fix ACL permissions for user:
45+
`redis-cli ACL SETUSER username +@all`
46+
- Disable auth temporarily (UNSAFE):
47+
`redis-cli CONFIG SET requirepass ""`
48+
49+
ACL TROUBLESHOOTING:
50+
- List user permissions: `redis-cli ACL GETUSER username`
51+
- Grant specific command access:
52+
`redis-cli ACL SETUSER username +get +set +del`
53+
- Create new user with full access:
54+
`redis-cli ACL SETUSER newuser on >password +@all`
55+
56+
PREVENTION:
57+
- Use environment variables for passwords
58+
- Implement proper secret management
59+
- Regular password rotation with coordination
60+
- Monitor authentication failure rates
61+
- Use ACL for fine-grained access control
62+
- Document authentication requirements
63+
references:
64+
- https://redis.io/docs/latest/operate/oss_and_stack/management/security/acl/
65+
- https://redis.io/commands/auth/
66+
- https://redis.io/docs/latest/operate/oss_and_stack/management/security/
67+
applications:
68+
- name: redis
69+
version: ">=6.0.0"
70+
impactScore: 7
71+
mitigationScore: 8
72+
reports: 156
73+
rule:
74+
set:
75+
window: 120s
76+
event:
77+
source: cre.log.redis
78+
match:
79+
- regex: "WRONGPASS invalid username-password pair"
80+
- regex: "NOAUTH Authentication required"
81+
- regex: "ERR invalid password"
82+
- regex: "ERR wrong password"
83+
- regex: "NOPERM.*has no permissions to run"
84+
- regex: "ERR ACL.*permission denied"
85+
- regex: "AUTH failed.*invalid.*credentials"

rules/cre-2025-0174/test.log

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2024-01-15 12:00:01.123 [ERROR] Redis authentication failed: WRONGPASS invalid username-password pair or user is disabled.
2+
2024-01-15 12:00:02.234 [ERROR] (error) NOAUTH Authentication required.
3+
2024-01-15 12:00:03.345 [ERROR] redis.exceptions.ResponseError: ERR invalid password
4+
2024-01-15 12:00:04.456 [ERROR] Command rejected: ERR wrong password provided
5+
2024-01-15 12:00:05.567 [ERROR] ACL violation: NOPERM User 'readonly' has no permissions to run the 'SET' command
6+
2024-01-15 12:00:06.678 [ERROR] ERR ACL permission denied for user 'app_user' on command 'FLUSHDB'
7+
2024-01-15 12:00:07.789 [ERROR] AUTH failed: invalid username/password credentials
8+
2024-01-15 12:00:08.890 [WARN] Redis server returned: NOAUTH Authentication required for this operation
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: Qm7WxPr3NbKfLs9YhVaEz2
5+
hash: Td5Gn8XqPmWsRf4BkLyVe3
6+
cre:
7+
id: CRE-2025-0175
8+
severity: 0
9+
title: "Redis Master-Replica Synchronization Failure"
10+
category: "in-memory-database-problem"
11+
author: Prequel Community
12+
description: |
13+
Detects failures in Redis master-replica synchronization including broken replication links, sync timeouts, and full resync loops. These issues compromise data consistency and high availability in Redis deployments.
14+
cause: |
15+
- Network partition between master and replica
16+
- Replica unable to keep up with master write load
17+
- Insufficient replica output buffer size
18+
- Master rewrite of AOF/RDB during sync
19+
- Replica disk I/O too slow for sync
20+
- Version incompatibility between master and replica
21+
- Replication backlog size too small
22+
impact: |
23+
- Replicas serve stale or inconsistent data
24+
- Failover capability compromised
25+
- Read scaling degraded with out-of-sync replicas
26+
- Full resync causing performance impact
27+
- Potential data loss during failover
28+
- Increased load on master during resync attempts
29+
tags:
30+
- redis
31+
- replication
32+
- master-replica
33+
- sync
34+
- psync
35+
mitigation: |
36+
IMMEDIATE ACTIONS:
37+
- Check replication status: `redis-cli INFO replication`
38+
- Verify replica connectivity: `redis-cli -h replica ping`
39+
- Monitor sync progress: `redis-cli INFO | grep master_sync`
40+
- Check replication lag: `redis-cli INFO | grep master_repl_offset`
41+
42+
RECOVERY:
43+
- Restart replication on replica:
44+
```
45+
redis-cli REPLICAOF NO ONE
46+
redis-cli REPLICAOF master-host master-port
47+
```
48+
- Increase replication backlog:
49+
`redis-cli CONFIG SET repl-backlog-size 256mb`
50+
- Adjust replica output buffer:
51+
`redis-cli CONFIG SET client-output-buffer-limit "replica 256mb 64mb 60"`
52+
- Force full resync if partial sync fails:
53+
`redis-cli PSYNC replicationid -1`
54+
55+
TROUBLESHOOTING:
56+
- Check network latency: `ping -c 100 master-host`
57+
- Monitor disk I/O: `iostat -x 1`
58+
- Review Redis logs: `tail -f /var/log/redis/redis-server.log`
59+
- Verify firewall rules allow port 6379
60+
61+
PREVENTION:
62+
- Size replication backlog appropriately
63+
- Monitor replication lag metrics
64+
- Use dedicated network for replication
65+
- Optimize disk I/O on replicas
66+
- Regular testing of failover procedures
67+
- Keep master and replica versions in sync
68+
references:
69+
- https://redis.io/docs/latest/operate/oss_and_stack/management/replication/
70+
- https://redis.io/commands/psync/
71+
- https://redis.io/topics/persistence
72+
applications:
73+
- name: redis
74+
version: ">=2.8.0"
75+
impactScore: 8
76+
mitigationScore: 6
77+
reports: 67
78+
rule:
79+
set:
80+
window: 300s
81+
event:
82+
source: cre.log.redis
83+
match:
84+
- regex: "Unable to connect to MASTER"
85+
- regex: "MASTER.*sync.*timeout"
86+
- regex: "Partial resynchronization not accepted"
87+
- regex: "SYNC failed.*Cannot allocate memory"
88+
- regex: "Full resync.*aborted"
89+
- regex: "Replication.*broken.*disconnected"
90+
- regex: "Error condition on socket for SYNC"
91+
- regex: "master_link_status:down"

rules/cre-2025-0175/test.log

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2024-01-15 13:00:01.123 [ERROR] Unable to connect to MASTER: connection refused
2+
2024-01-15 13:00:02.234 [ERROR] MASTER <-> REPLICA sync: timeout in receiving data from master
3+
2024-01-15 13:00:03.345 [WARN] Partial resynchronization not accepted: full resync required
4+
2024-01-15 13:00:04.456 [ERROR] SYNC failed: Cannot allocate memory for replication backlog
5+
2024-01-15 13:00:05.567 [ERROR] Full resync from master aborted: read error
6+
2024-01-15 13:00:06.678 [CRITICAL] Replication link broken: disconnected from master
7+
2024-01-15 13:00:07.789 [ERROR] Error condition on socket for SYNC: Connection reset by peer
8+
2024-01-15 13:00:08.890 [INFO] master_link_status:down master_link_down_since_seconds:45
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
rules:
2+
- metadata:
3+
kind: prequel
4+
id: Yx4NmQp7RdWfKs8LbHaVt9
5+
hash: Pm3Xk6WsNbRq5TfGeLyVn2
6+
cre:
7+
id: CRE-2025-0176
8+
severity: 0
9+
title: "Redis Persistence Failure - MISCONF Disk Write Errors"
10+
category: "in-memory-database-problem"
11+
author: Prequel Community
12+
description: |
13+
Detects Redis MISCONF errors when the server cannot persist data to disk due to RDB/AOF write failures. This critical error prevents Redis from saving snapshots and may lead to data loss on restart.
14+
cause: |
15+
- Disk full or insufficient space for RDB/AOF files
16+
- File system permissions preventing writes
17+
- Disk I/O errors or hardware failures
18+
- AOF file corruption
19+
- Background save process (BGSAVE) failures
20+
- Operating system resource limits reached
21+
- File system mounted read-only
22+
impact: |
23+
- Redis stops accepting write commands (by default)
24+
- Complete data loss on server restart
25+
- Inability to create backups
26+
- Replication to slaves may fail
27+
- Application write operations blocked
28+
- Service degradation or outage
29+
tags:
30+
- redis
31+
- persistence
32+
- misconf
33+
- rdb
34+
- aof
35+
- disk
36+
mitigation: |
37+
IMMEDIATE ACTIONS:
38+
- Check disk space: `df -h /var/lib/redis`
39+
- Review Redis persistence status: `redis-cli INFO persistence`
40+
- Check last save status: `redis-cli LASTSAVE`
41+
- Verify file permissions: `ls -la /var/lib/redis/`
42+
43+
RECOVERY:
44+
- Free disk space:
45+
```
46+
# Clean old logs
47+
find /var/log -name "*.gz" -delete
48+
# Remove old backups
49+
rm /var/lib/redis/dump.rdb.old
50+
```
51+
- Fix permissions:
52+
`chown redis:redis /var/lib/redis/*`
53+
- Temporarily disable persistence (RISKY):
54+
```
55+
redis-cli CONFIG SET save ""
56+
redis-cli CONFIG SET stop-writes-on-bgsave-error no
57+
```
58+
- Force manual save after fixing:
59+
`redis-cli BGSAVE`
60+
61+
DISK TROUBLESHOOTING:
62+
- Check disk errors: `dmesg | grep -i error`
63+
- Verify filesystem: `fsck /dev/sda1`
64+
- Monitor I/O: `iostat -x 1`
65+
- Check mount options: `mount | grep redis`
66+
67+
PREVENTION:
68+
- Monitor disk usage with alerts at 80% capacity
69+
- Regular disk cleanup automation
70+
- Separate partition for Redis data
71+
- Configure appropriate save intervals
72+
- Use both RDB and AOF for redundancy
73+
- Regular backup verification
74+
references:
75+
- https://redis.io/docs/latest/operate/oss_and_stack/management/persistence/
76+
- https://redis.io/commands/bgsave/
77+
- https://redis.io/topics/problems#background-saving-fails-with-a-fork-error
78+
applications:
79+
- name: redis
80+
version: "*"
81+
impactScore: 9
82+
mitigationScore: 7
83+
reports: 234
84+
rule:
85+
set:
86+
window: 180s
87+
event:
88+
source: cre.log.redis
89+
match:
90+
- regex: "MISCONF Redis is configured to save RDB snapshots.*unable to persist.*disk"
91+
- regex: "Can't save in background"
92+
- regex: "Failed opening.*rdb for saving"
93+
- regex: "Write error saving DB on disk"
94+
- regex: "AOF.*write error"
95+
- regex: "Error moving temp.*file.*final destination"
96+
- regex: "BGSAVE.*failed.*No space"

0 commit comments

Comments
 (0)