feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

SoumyaRaikwar · 2025-08-10T22:30:32Z

What this PR does / why we need it

This PR adds the kube_deployment_spec_topology_spread_constraints metric that counts the number of topology spread constraints defined in a deployment's pod template specification.

This PR solves the topology spread constraints monitoring requirement from issue #2701, which specifically requested visibility into scheduling primitives including "pod topology spread constraints" for workload pod distribution monitoring.

Which issue(s) this PR fixes

Solves topology spread constraints monitoring from #2701 - Add schedule spec and status for workload

Problem Solved

Issue #2701 identified that operators need to monitor various scheduling primitives to detect when "break variation may happen because pod priority preemption or node pressure eviction."

- Adds new metric to count topology spread constraints in deployment pod templates - Includes comprehensive test coverage for both cases (with/without constraints) - Follows existing patterns and stability guidelines

k8s-ci-robot · 2025-08-10T22:30:40Z

This issue is currently awaiting triage.

If kube-state-metrics contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-08-10T22:30:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SoumyaRaikwar
Once this PR has been reviewed and has the lgtm label, please assign rexagod for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…umentation

…tation files

mrueg · 2025-08-13T08:08:02Z

How would you use this metric for alerting or to provide info about the deployment?

SoumyaRaikwar · 2025-08-13T18:19:15Z

How would you use this metric for alerting or to provide info about the deployment?

These topology spread constraint metrics enable critical alerting on workload distribution policies: kube_deployment_spread_topology_constraint_metric > 0 helps detect when spread constraints exist but workloads become unevenly distributed across zones/nodes during resource pressure.

You can alert on missing distribution policies with (kube_deployment_spec_replicas > 1) and (kube_deployment_spread_topology_constraint_metric == 0) to identify multi-replica deployments lacking proper spread configuration.

For dashboards, count(kube_deployment_spread_topology_constraint_metric > 0) shows cluster-wide adoption of topology spread policies, complementing the pod affinity/anti-affinity metrics I implemented in PR #2733.

During incidents, these metrics help correlate why workloads became concentrated in specific topology domains or why pods failed to schedule due to overly restrictive spread policies.

This completes the scheduling observability suite from issue #2701 - together with my pod affinity/anti-affinity metrics (PR #2733), operators now have full visibility into both co-location/separation rules AND even distribution policies across cluster topology. Thanks @mrueg!

mrueg · 2025-08-14T18:59:05Z

Same comment as in the other PR #2733 applies here, we should have explicit metrics per kube_deployment_topology_spread_constraint{} and not simply count a length.

Replace kube_deployment_spec_topology_spread_constraints count metric with kube_deployment_spec_topology_spread_constraint that exposes detailed information for each constraint including topology_key, max_skew, when_unsatisfiable, min_domains, and label_selector labels. This enables more granular monitoring and alerting on topology spread constraint configurations across deployments. - Add explicit per-constraint metric generation - Handle nil MinDomains with default value of 1 - Add proper LabelSelector string conversion - Update tests with constraint details - Mark metric as ALPHA stability - Update documentation tables Closes: #<issue-number-if-applicable>

… for topology spread constraints in deployments.

SoumyaRaikwar · 2025-08-19T07:32:41Z

Add explicit topology spread constraint metric for deployments
Replaces kube_deployment_spec_topology_spread_constraints (count-only) with kube_deployment_spec_topology_spread_constraint that exposes detailed labels per constraint: topology_key, max_skew, when_unsatisfiable, min_domains, and label_selector.

This enables precise Prometheus queries and better observability of constraint configurations across deployments.

Thanks @mrueg for suggesting to make this metric explicit! ✓ Tests updated ✓ Documentation updated

…etric

- Add document start markers (---) to all YAML files - Fix indentation errors throughout manifests - Resolve line length violations - Correct YAML list formatting - Update image name to remove -amd64 suffix

…etric

Add kube_deployment_spec_topology_spread_constraints metric

6f56267

- Adds new metric to count topology spread constraints in deployment pod templates - Includes comprehensive test coverage for both cases (with/without constraints) - Follows existing patterns and stability guidelines

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 10, 2025

k8s-ci-robot requested review from logicalhan and rexagod August 10, 2025 22:30

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 10, 2025

SoumyaRaikwar changed the title ~~Add kube_deployment_spec_topology_spread_constraints metric for issue #2701~~ feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 Aug 10, 2025

Update deployment.go

6937b62

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 10, 2025

Update deployment_test.go

029314c

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 10, 2025

SoumyaRaikwar added 3 commits August 11, 2025 12:32

docs: Add kube_deployment_spec_topology_spread_constraints metric doc…

f713f6d

…umentation

fix: Format imports in deployment_test.go

cffd506

docs: Add kube_deployment_spec_topology_spread_constraints to documen…

206a445

…tation files

Fix formatting in deployment_test.go

3634213

SoumyaRaikwar mentioned this pull request Aug 13, 2025

Add schedule spec and status for workload #2701

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2025

SoumyaRaikwar added 2 commits August 19, 2025 12:42

corrected the formatting of the commit message and added a new metric…

b591c60

… for topology spread constraints in deployments.

Merge branch 'main' into add-deployment-topology-spread-constraints-m…

2a16027

…etric

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 19, 2025

rexagod added this to SIG Instrumentation Aug 26, 2025

github-project-automation bot moved this to Needs Triage in SIG Instrumentation Aug 26, 2025

SoumyaRaikwar added 5 commits August 30, 2025 17:15

Resolve merge conflict in deployment metrics documentation

d7b89a3

Fix deployment metrics documentation with reason label

655b393

updated deployment-metrics.md

21d1ee0

Update deployment metrics documentation

dc3fec2

Fix missing newline in deployment metrics documentation

e67b066

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2025

corrected intendation in deployment-metrics.md

679d205

SoumyaRaikwar force-pushed the add-deployment-topology-spread-constraints-metric branch from a96beb0 to 679d205 Compare August 30, 2025 22:51

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2025

Merge branch 'main' into add-deployment-topology-spread-constraints-m…

705afa7

…etric

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2025

SoumyaRaikwar and others added 3 commits September 1, 2025 13:29

Update deployment-metrics.md

06d1481

done

50220e8

Fix YAML formatting and indentation in manifest files

e3fbe3f

- Add document start markers (---) to all YAML files - Fix indentation errors throughout manifests - Resolve line length violations - Correct YAML list formatting - Update image name to remove -amd64 suffix

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 1, 2025

Merge branch 'main' into add-deployment-topology-spread-constraints-m…

ac06d33

…etric

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

SoumyaRaikwar commented Aug 10, 2025

Uh oh!

k8s-ci-robot commented Aug 10, 2025

Uh oh!

k8s-ci-robot commented Aug 10, 2025

Uh oh!

mrueg commented Aug 13, 2025

Uh oh!

SoumyaRaikwar commented Aug 13, 2025

Uh oh!

mrueg commented Aug 14, 2025

Uh oh!

SoumyaRaikwar commented Aug 19, 2025

Uh oh!

Uh oh!

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

Are you sure you want to change the base?

feat: Add kube_deployment_spec_topology_spread_constraints metric for issue #2701 #2728

Conversation

SoumyaRaikwar commented Aug 10, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Problem Solved

Uh oh!

k8s-ci-robot commented Aug 10, 2025

Uh oh!

k8s-ci-robot commented Aug 10, 2025

Uh oh!

mrueg commented Aug 13, 2025

Uh oh!

SoumyaRaikwar commented Aug 13, 2025

Uh oh!

mrueg commented Aug 14, 2025

Uh oh!

SoumyaRaikwar commented Aug 19, 2025

Uh oh!

Uh oh!