Added cpu/memory requested panels for capacity management dashboard #2845

Gabbe16 · 2025-11-07T10:22:16Z

Added cpu/memory requested panels for capacity management dashboard

Warning

This is a public repository, ensure not to disclose:

personal data beyond what is necessary for interacting with this pull request, nor
business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

kind/admin-change
kind/dev-change
kind/security
[kind/adr](set-me)

What does this PR do / why do we need this PR?

This PR adds new panels for the rows CPU/Memory usage in the capacity management dashboard. The panels are as follows; requested CPU/Memory per node, average requested CPU/Memory per node group, CPU/Memory limits per node and average CPU/Memory limits per node group. This is so you can easily compare usage, requests and limits in one place as per stated in the issue that this would solve. Below there are two example images of how this looks like on the Grafana capacity management dashboard.

New CPU dashboards

New Memory dashboards

Fixes Add requests panels in Capacity Management Dashboard #2505

Information to reviewers

The Promql queries for these dashboards looks like the following, if you feel that you would want to give feedback or to test them. I'm only putting the cpu queries here since the memory panels look basically the same except that "resource" is instead set to memory in the queries.

CPU requested

sum by (node) (kube_pod_container_resource_requests{cluster=~"$cluster", resource="cpu"} and on (pod, namespace, cluster) kube_pod_info{cluster=~"$cluster"} and on (pod, namespace, cluster) kube_pod_status_phase{phase="Running", cluster=~"$cluster"} == 1 ) / sum by (node) (kube_node_status_allocatable{cluster=~"$cluster", resource="cpu"}) * on (node) group_left (label_elastisys_io_node_group) label_replace(kube_node_labels{label_elastisys_io_node_group=~"$NodeGroup"}, "instance", "$1", "node", "(.*)")

Average CPU requested

avg(sum by (node) (kube_pod_container_resource_requests{cluster=~"$cluster", resource="cpu"} and on (pod, namespace, cluster) kube_pod_info{cluster=~"$cluster"} and on (pod, namespace, cluster) kube_pod_status_phase{phase="Running", cluster=~"$cluster"} == 1 ) / sum by (node) (kube_node_status_allocatable{cluster=~"$cluster", resource="cpu"}) * on (node) group_left (label_elastisys_io_node_group) label_replace(kube_node_labels{label_elastisys_io_node_group=~"$NodeGroup"}, "instance", "$1", "node", "(.*)"))

CPU limits

sum by (node) (kube_pod_container_resource_limits{cluster=~"$cluster", resource="cpu"} and on (pod, namespace, cluster) kube_pod_info{cluster=~"$cluster"} and on (pod, namespace, cluster) kube_pod_status_phase{phase="Running", cluster=~"$cluster"} == 1 ) / sum by (node) (kube_node_status_allocatable{cluster=~"$cluster", resource="cpu"}) * on (node) group_left (label_elastisys_io_node_group) label_replace(kube_node_labels{label_elastisys_io_node_group=~"$NodeGroup"}, "instance", "$1", "node", "(.*)")

Average CPU limits

avg(sum by (node) (kube_pod_container_resource_limits{cluster=~"$cluster", resource="cpu"} and on (pod, namespace, cluster) kube_pod_info{cluster=~"$cluster"} and on (pod, namespace, cluster) kube_pod_status_phase{phase="Running", cluster=~"$cluster"} == 1 ) / sum by (node) (kube_node_status_allocatable{cluster=~"$cluster", resource="cpu"}) * on (node) group_left (label_elastisys_io_node_group) label_replace(kube_node_labels{label_elastisys_io_node_group=~"$NodeGroup"}, "instance", "$1", "node", "(.*)"))

Checklist

Zash

Queries look sensible.

helmfile.d/charts/grafana-dashboards/dashboards/capacity-management-dashboard.json

Gabbe16 · 2025-11-14T10:31:36Z

The PR description has now been updated to reflect the newest implementations suggested by @viktor-f . If you already read the past one please reread it and give more feedback, thanks!

…oard fix: set panel collapse to false fix: panels being broken and not showing up apps: restructured panels and added new panels for cpu/memory regarding the average usage and limits apps: added average CPU/Memory limits dashboards to complete the set + dashboard name changes

viktor-f

Looks good, thanks for doing the extra suggestions as well 🚀

Gabbe16 requested review from Ajarmar, anders-elastisys, davidumea, kcrwi, lucianvlad and lunkan93 November 7, 2025 10:22

Gabbe16 requested a review from a team as a code owner November 7, 2025 10:22

Gabbe16 added the kind/improvement Improvement of existing features, e.g. code cleanup or optimizations. label Nov 7, 2025

Zash approved these changes Nov 10, 2025

View reviewed changes

helmfile.d/charts/grafana-dashboards/dashboards/capacity-management-dashboard.json Outdated Show resolved Hide resolved

Gabbe16 force-pushed the gabbe16/add-requests-panels-capacity-management-dashboard branch from c519b9a to 0bc2b95 Compare November 10, 2025 12:15

Gabbe16 requested a review from Zash November 10, 2025 12:19

viktor-f reviewed Nov 11, 2025

View reviewed changes

helmfile.d/charts/grafana-dashboards/dashboards/capacity-management-dashboard.json Show resolved Hide resolved

Gabbe16 force-pushed the gabbe16/add-requests-panels-capacity-management-dashboard branch 2 times, most recently from f19a99e to f15c184 Compare November 14, 2025 10:10

Gabbe16 requested a review from viktor-f November 14, 2025 10:31

Gabbe16 force-pushed the gabbe16/add-requests-panels-capacity-management-dashboard branch from f15c184 to ebbeb0c Compare November 25, 2025 12:30

viktor-f approved these changes Nov 25, 2025

View reviewed changes

Gabbe16 merged commit 5fa2791 into main Dec 1, 2025
13 checks passed

Gabbe16 deleted the gabbe16/add-requests-panels-capacity-management-dashboard branch December 1, 2025 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added cpu/memory requested panels for capacity management dashboard #2845

Added cpu/memory requested panels for capacity management dashboard #2845

Uh oh!

Gabbe16 commented Nov 7, 2025 •

edited

Loading

Uh oh!

Zash left a comment

Uh oh!

Uh oh!

Uh oh!

Gabbe16 commented Nov 14, 2025

Uh oh!

viktor-f left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added cpu/memory requested panels for capacity management dashboard #2845

Added cpu/memory requested panels for capacity management dashboard #2845

Uh oh!

Conversation

Gabbe16 commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What kind of PR is this?

What does this PR do / why do we need this PR?

Information to reviewers

Checklist

Uh oh!

Zash left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Gabbe16 commented Nov 14, 2025

Uh oh!

viktor-f left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gabbe16 commented Nov 7, 2025 •

edited

Loading