Skip to content

Commit a70b72d

Browse files
Multiple Components: Remove vector(0) from memory hpa query (#12386)
#### What this PR does Similar to #12384. If we're not able to compute the value we have to return no series at all. If we return 0 it means "scale down to HPA min". Consider the following comparison of the metric with and without `vector(0)`. With the fallback in-place (yellow) a failure to scrape from 14:22 to 14:32 (second graph, y-axis == `count(up{...})`) yields a much lower metric, which may allow the deployment to scale down unexpectedly. <img width="1205" height="479" alt="Screenshot 2025-08-13 at 8 07 22 PM" src="https://github.com/user-attachments/assets/22cd545e-593d-4210-97ea-30cebb7ce9e0" /> <img width="1230" height="441" alt="Screenshot 2025-08-13 at 8 08 22 PM" src="https://github.com/user-attachments/assets/f7ce2e70-3e67-456c-b385-4181e058dce1" /> #### Which issue(s) this PR fixes or relates to Fixes #<issue number> #### Checklist - [ ] Tests updated. - [ ] Documentation added. - [ ] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features.
1 parent acd9215 commit a70b72d

7 files changed

+68
-133
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
* [CHANGE] Distributor: Reduce calculated `GOMAXPROCS` to be closer to the requested number of CPUs. #12150
3434
* [CHANGE] Query-scheduler: The query-scheduler is now a required component that is always used by queriers and query-frontends. #12187
3535
* [CHANGE] Use `irate()` when calculating CPU scaling metric. Using `irate()` prevents underestimating CPU utilization when scraping fails. #12384
36+
* [CHANGE] Remove `vector(0)` when calculating memory scaling metric to prevent underestimating memory usage when scraping fails. #12386
3637

3738
### Documentation
3839

operations/mimir-tests/test-autoscaling-custom-target-utilization-generated.yaml

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1995,7 +1995,7 @@ spec:
19951995
sum by (pod) (container_memory_working_set_bytes{container="alertmanager",namespace="default"})
19961996
and
19971997
max by (pod) (up{container="alertmanager",namespace="default"}) > 0
1998-
) or vector(0)
1998+
)
19991999
)[15m:]
20002000
)
20012001
+
@@ -2005,7 +2005,6 @@ spec:
20052005
max by (pod) (changes(kube_pod_container_status_restarts_total{container="alertmanager", namespace="default"}[15m]) > 0)
20062006
and
20072007
max by (pod) (kube_pod_container_status_last_terminated_reason{container="alertmanager", namespace="default", reason="OOMKilled"})
2008-
or vector(0)
20092008
)
20102009
serverAddress: http://prometheus.default:9090/prometheus
20112010
threshold: "9556302233"
@@ -2067,7 +2066,7 @@ spec:
20672066
sum by (pod) (container_memory_working_set_bytes{container="distributor",namespace="default"})
20682067
and
20692068
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true"}[1m])) > 0
2070-
) or vector(0)
2069+
)
20712070
)[15m:]
20722071
)
20732072
+
@@ -2077,7 +2076,6 @@ spec:
20772076
max by (pod) (changes(kube_pod_container_status_restarts_total{container="distributor", namespace="default"}[15m]) > 0)
20782077
and
20792078
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled"})
2080-
or vector(0)
20812079
)
20822080
serverAddress: http://prometheus.default:9090/prometheus
20832081
threshold: "3058016714"
@@ -2182,7 +2180,7 @@ spec:
21822180
sum by (pod) (container_memory_working_set_bytes{container="query-frontend",namespace="default"})
21832181
and
21842182
max by (pod) (up{container="query-frontend",namespace="default"}) > 0
2185-
) or vector(0)
2183+
)
21862184
)[15m:]
21872185
)
21882186
+
@@ -2192,7 +2190,6 @@ spec:
21922190
max by (pod) (changes(kube_pod_container_status_restarts_total{container="query-frontend", namespace="default"}[15m]) > 0)
21932191
and
21942192
max by (pod) (kube_pod_container_status_last_terminated_reason{container="query-frontend", namespace="default", reason="OOMKilled"})
2195-
or vector(0)
21962193
)
21972194
serverAddress: http://prometheus.default:9090/prometheus
21982195
threshold: "559939584"
@@ -2244,7 +2241,7 @@ spec:
22442241
sum by (pod) (container_memory_working_set_bytes{container="ruler",namespace="default"})
22452242
and
22462243
max by (pod) (up{container="ruler",namespace="default"}) > 0
2247-
) or vector(0)
2244+
)
22482245
)[15m:]
22492246
)
22502247
+
@@ -2254,7 +2251,6 @@ spec:
22542251
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler", namespace="default"}[15m]) > 0)
22552252
and
22562253
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler", namespace="default", reason="OOMKilled"})
2257-
or vector(0)
22582254
)
22592255
serverAddress: http://prometheus.default:9090/prometheus
22602256
threshold: "5733781340"
@@ -2306,7 +2302,7 @@ spec:
23062302
sum by (pod) (container_memory_working_set_bytes{container="ruler-querier",namespace="default"})
23072303
and
23082304
max by (pod) (up{container="ruler-querier",namespace="default"}) > 0
2309-
) or vector(0)
2305+
)
23102306
)[15m:]
23112307
)
23122308
+
@@ -2316,7 +2312,6 @@ spec:
23162312
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler-querier", namespace="default"}[15m]) > 0)
23172313
and
23182314
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-querier", namespace="default", reason="OOMKilled"})
2319-
or vector(0)
23202315
)
23212316
serverAddress: http://prometheus.default:9090/prometheus
23222317
threshold: "955630223"
@@ -2376,7 +2371,7 @@ spec:
23762371
sum by (pod) (container_memory_working_set_bytes{container="ruler-query-frontend",namespace="default"})
23772372
and
23782373
max by (pod) (up{container="ruler-query-frontend",namespace="default"}) > 0
2379-
) or vector(0)
2374+
)
23802375
)[15m:]
23812376
)
23822377
+
@@ -2386,7 +2381,6 @@ spec:
23862381
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler-query-frontend", namespace="default"}[15m]) > 0)
23872382
and
23882383
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-query-frontend", namespace="default", reason="OOMKilled"})
2389-
or vector(0)
23902384
)
23912385
serverAddress: http://prometheus.default:9090/prometheus
23922386
threshold: "559939584"

operations/mimir-tests/test-autoscaling-generated.yaml

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1995,7 +1995,7 @@ spec:
19951995
sum by (pod) (container_memory_working_set_bytes{container="alertmanager",namespace="default"})
19961996
and
19971997
max by (pod) (up{container="alertmanager",namespace="default"}) > 0
1998-
) or vector(0)
1998+
)
19991999
)[15m:]
20002000
)
20012001
+
@@ -2005,7 +2005,6 @@ spec:
20052005
max by (pod) (changes(kube_pod_container_status_restarts_total{container="alertmanager", namespace="default"}[15m]) > 0)
20062006
and
20072007
max by (pod) (kube_pod_container_status_last_terminated_reason{container="alertmanager", namespace="default", reason="OOMKilled"})
2008-
or vector(0)
20092008
)
20102009
serverAddress: http://prometheus.default:9090/prometheus
20112010
threshold: "10737418240"
@@ -2067,7 +2066,7 @@ spec:
20672066
sum by (pod) (container_memory_working_set_bytes{container="distributor",namespace="default"})
20682067
and
20692068
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true"}[1m])) > 0
2070-
) or vector(0)
2069+
)
20712070
)[15m:]
20722071
)
20732072
+
@@ -2077,7 +2076,6 @@ spec:
20772076
max by (pod) (changes(kube_pod_container_status_restarts_total{container="distributor", namespace="default"}[15m]) > 0)
20782077
and
20792078
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled"})
2080-
or vector(0)
20812079
)
20822080
serverAddress: http://prometheus.default:9090/prometheus
20832081
threshold: "3435973836"
@@ -2182,7 +2180,7 @@ spec:
21822180
sum by (pod) (container_memory_working_set_bytes{container="query-frontend",namespace="default"})
21832181
and
21842182
max by (pod) (up{container="query-frontend",namespace="default"}) > 0
2185-
) or vector(0)
2183+
)
21862184
)[15m:]
21872185
)
21882186
+
@@ -2192,7 +2190,6 @@ spec:
21922190
max by (pod) (changes(kube_pod_container_status_restarts_total{container="query-frontend", namespace="default"}[15m]) > 0)
21932191
and
21942192
max by (pod) (kube_pod_container_status_last_terminated_reason{container="query-frontend", namespace="default", reason="OOMKilled"})
2195-
or vector(0)
21962193
)
21972194
serverAddress: http://prometheus.default:9090/prometheus
21982195
threshold: "629145600"
@@ -2244,7 +2241,7 @@ spec:
22442241
sum by (pod) (container_memory_working_set_bytes{container="ruler",namespace="default"})
22452242
and
22462243
max by (pod) (up{container="ruler",namespace="default"}) > 0
2247-
) or vector(0)
2244+
)
22482245
)[15m:]
22492246
)
22502247
+
@@ -2254,7 +2251,6 @@ spec:
22542251
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler", namespace="default"}[15m]) > 0)
22552252
and
22562253
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler", namespace="default", reason="OOMKilled"})
2257-
or vector(0)
22582254
)
22592255
serverAddress: http://prometheus.default:9090/prometheus
22602256
threshold: "6442450944"
@@ -2306,7 +2302,7 @@ spec:
23062302
sum by (pod) (container_memory_working_set_bytes{container="ruler-querier",namespace="default"})
23072303
and
23082304
max by (pod) (up{container="ruler-querier",namespace="default"}) > 0
2309-
) or vector(0)
2305+
)
23102306
)[15m:]
23112307
)
23122308
+
@@ -2316,7 +2312,6 @@ spec:
23162312
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler-querier", namespace="default"}[15m]) > 0)
23172313
and
23182314
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-querier", namespace="default", reason="OOMKilled"})
2319-
or vector(0)
23202315
)
23212316
serverAddress: http://prometheus.default:9090/prometheus
23222317
threshold: "1073741824"
@@ -2376,7 +2371,7 @@ spec:
23762371
sum by (pod) (container_memory_working_set_bytes{container="ruler-query-frontend",namespace="default"})
23772372
and
23782373
max by (pod) (up{container="ruler-query-frontend",namespace="default"}) > 0
2379-
) or vector(0)
2374+
)
23802375
)[15m:]
23812376
)
23822377
+
@@ -2386,7 +2381,6 @@ spec:
23862381
max by (pod) (changes(kube_pod_container_status_restarts_total{container="ruler-query-frontend", namespace="default"}[15m]) > 0)
23872382
and
23882383
max by (pod) (kube_pod_container_status_last_terminated_reason{container="ruler-query-frontend", namespace="default", reason="OOMKilled"})
2389-
or vector(0)
23902384
)
23912385
serverAddress: http://prometheus.default:9090/prometheus
23922386
threshold: "629145600"

operations/mimir-tests/test-multi-zone-distributor-generated.yaml

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2610,7 +2610,7 @@ spec:
26102610
sum by (pod) (container_memory_working_set_bytes{container="distributor",namespace="default",pod=~"distributor-zone-a.*"})
26112611
and
26122612
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true",pod=~"distributor-zone-a.*"}[1m])) > 0
2613-
) or vector(0)
2613+
)
26142614
)[15m:]
26152615
)
26162616
+
@@ -2620,7 +2620,6 @@ spec:
26202620
max by (pod) (changes(kube_pod_container_status_restarts_total{container="distributor", namespace="default",pod=~"distributor-zone-a.*"}[15m]) > 0)
26212621
and
26222622
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled",pod=~"distributor-zone-a.*"})
2623-
or vector(0)
26242623
)
26252624
serverAddress: http://prometheus.default:9090/prometheus
26262625
threshold: "2147483648"
@@ -2682,7 +2681,7 @@ spec:
26822681
sum by (pod) (container_memory_working_set_bytes{container="distributor",namespace="default",pod=~"distributor-zone-b.*"})
26832682
and
26842683
max by (pod) (min_over_time(kube_pod_status_ready{namespace="default",condition="true",pod=~"distributor-zone-b.*"}[1m])) > 0
2685-
) or vector(0)
2684+
)
26862685
)[15m:]
26872686
)
26882687
+
@@ -2692,7 +2691,6 @@ spec:
26922691
max by (pod) (changes(kube_pod_container_status_restarts_total{container="distributor", namespace="default",pod=~"distributor-zone-b.*"}[15m]) > 0)
26932692
and
26942693
max by (pod) (kube_pod_container_status_last_terminated_reason{container="distributor", namespace="default", reason="OOMKilled",pod=~"distributor-zone-b.*"})
2695-
or vector(0)
26962694
)
26972695
serverAddress: http://prometheus.default:9090/prometheus
26982696
threshold: "2147483648"

0 commit comments

Comments
 (0)