-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Description
Problem
KubeClientCertificateExpiration alert triggers false positives with cloud providers that automatically renew certificates 7 days before expiration (DigitalOcean).
Current Behavior
- Cloud providers initiate certificate renewal exactly 7 days (604,800 seconds) before expiration
- Alert rule threshold matches exactly 7 days (< 604800) with for: 5m duration
- Alert triggers during auto-renewal operations and resolves within ~5 minutes
- Operations team receives false warnings without actual certificate issues
Creates unnecessary noise and alert fatigue
Expected Behavior
- Buffer should prevent warnings during planned auto-renewal
- Real expiring certificates should still trigger warnings with at least 24h advance
- Minimal changes to existing chart configuration
Proposed Solution
Change warning threshold from 7 days (604800 seconds) to 6 days 22 hours (601200 seconds).
Why This Works
- Creates 2-hour buffer between cloud provider operations and alerting
- Prevents false positives while maintaining security monitoring
- Critical alert (<24 hours) remains unchanged for real emergencies
- Minimal impact - still catches genuine certificate issues with 6+ days warning
Impact
- Reduces operational noise from planned maintenance
- Maintains security - real certificate failures still detected
- Cloud-agnostic - benefits all users of cloud providers with 7-day renewal policies
Additional Context
- Chart: kube-prometheus-stack
- Rule: KubeClientCertificateExpiration (warning severity)
baznikin
Metadata
Metadata
Assignees
Labels
No labels