Skip to content

Conversation

@anders-elastisys
Copy link
Contributor

@anders-elastisys anders-elastisys commented Jan 3, 2025

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • [kind/adr](set-me)

What does this PR do / why do we need this PR?

Was looking through some of the alerts and when comparing them to upstream noticed some deviations:

  • Some expressions in our alerts are not computer per cluster which is what we want.
  • Noticed that the alertmanager alerts we have did not have the runbookURL configured, this PR adds this (the URLs seem to point to actual runbooks).
  • We have alerts for kube-state-metrics but the metrics used in the queries are missing due to the self monitoring option for kube-state-metrics being disabled by default. This PR enables it, with it enabled the issue encountered in this PR fires the kube-state-metrics alerts which it did not previously do. The metrics were exposed on port 8081 so the networkpolicies needed to be updated.
  • Fixes #

Information to reviewers

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change updates CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts required no updates)
    • The metrics names did change (Grafana dashboards and Prometheus alerts required an update)
  • Logs checks:
    • The logs do not show any errors after the change
  • PodSecurityPolicy checks:
    • Any changed Pod is covered by Kubernetes Pod Security Standards
    • Any changed Pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any Pods to be blocked by Pod Security Standards or Policies
  • NetworkPolicy checks:
    • Any changed Pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@anders-elastisys anders-elastisys requested review from a team as code owners January 3, 2025 12:07
Copy link
Contributor

@aarnq aarnq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anders-elastisys anders-elastisys merged commit e370e99 into main Jan 15, 2025
13 checks passed
@anders-elastisys anders-elastisys deleted the anders-elastisys/kube-stack-prometheus-alerts-fixes branch January 15, 2025 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants