Skip to content

Commit 4cb8fdd

Browse files
committed
query rejection blog post improvement
Signed-off-by: Erlan Zholdubai uulu <[email protected]>
1 parent cc96eac commit 4cb8fdd

File tree

1 file changed

+4
-5
lines changed

1 file changed

+4
-5
lines changed

website/content/en/blog/2025/query-rejection.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ author: Erlan Zholdubai uulu ([@erlan-z](https://github.com/erlan-z))
1212

1313
# Introduction
1414

15-
We had events where a set of seemingly **harmless-looking** dashboard queries kept slipping just under our limits yet repeatedly **OOM-killing the querier pods**. Our safeguard mechanisms weren’t enough, and the only hope was that the tenant would either stop those queries or that we’d have to throttle all traffic from that tenant. Usually it wasn’t all traffic causing trouble—it was a small set of queries coming from a specific dashboard or some query with specific characteristics. We wished there was a way to manually specify query characteristics and reject them without throttling everything. **This inspired us to build query rejection**, a last-resort safety net for operators running multi-tenant Cortex clusters.
15+
Although Cortex includes various safeguards to protect against overload, they can’t prevent every failure scenario. In some environments, a small set of seemingly harmless-looking dashboard queries have repeatedly slipped just under the limits yet still OOM-killed the querier pods. Built-in protections weren’t enough, and the only available option was to throttle all incoming traffic. These queries often came from a specific dashboard or followed a predictable pattern. There was no way to block just those without affecting everything else. This inspired the introduction of query rejection, a last-resort safety net for operators running multi-tenant Cortex clusters.
1616

1717
## Why Limits Aren’t Enough
1818

@@ -88,7 +88,7 @@ Imagine a dashboard panel that repeatedly hits your cluster with a query like th
8888

8989
```bash
9090
curl \
91-
'http://localhost:8005/prometheus/api/v1/query?query=customALERTquery&start=1718383304&end=1718386904&step=7s' \
91+
'http://localhost:8005/prometheus/api/v1/query_range?query=customALERTquery&start=1718383304&end=1718386904&step=7s' \
9292
-H "User-Agent: other" \
9393
-H "X-Dashboard-Uid: dash123"
9494
```
@@ -105,7 +105,6 @@ Because this request matches all the configured attributes, it will be blocked.
105105

106106
## Conclusion
107107

108-
When traditional safeguards fall short, query rejection gives operators precise control to block only what’s harmful—without slowing down everything else.
109-
110-
If you operate a shared Cortex environment, consider learning how to use query rejection effectively. It might just save you from the next incident—by preventing OOM kills, degraded performance, or disruption to other tenants.
108+
When traditional safeguards fall short, query rejection gives operators precise control to block only what’s harmful; without slowing down everything else.
111109

110+
If you operate a shared Cortex environment, consider learning how to use query rejection effectively. It might just save you from the next incident; by preventing OOM kills, degraded performance, or disruption to other tenants.

0 commit comments

Comments
 (0)