Skip to content

Commit cfce88f

Browse files
authored
Blog: Dynamic Query Splitting (#6965)
* publish query dynamic splitting blog post Signed-off-by: Ahmed Hassan <[email protected]> * fix author id Signed-off-by: Ahmed Hassan <[email protected]> * update config doc Signed-off-by: Ahmed Hassan <[email protected]> * blog post small edits Signed-off-by: Ahmed Hassan <[email protected]> --------- Signed-off-by: Ahmed Hassan <[email protected]>
1 parent 52b9672 commit cfce88f

File tree

6 files changed

+188
-1
lines changed

6 files changed

+188
-1
lines changed

docs/configuration/config-file-reference.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4114,6 +4114,11 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
41144114
# CLI flag: -frontend.max-queriers-per-tenant
41154115
[max_queriers_per_tenant: <float> | default = 0]
41164116

4117+
# [Experimental] Number of shards to use when distributing shardable PromQL
4118+
# queries.
4119+
# CLI flag: -frontend.query-vertical-shard-size
4120+
[query_vertical_shard_size: <int> | default = 0]
4121+
41174122
# Enable to allow queries to be evaluated with data from a single zone, if other
41184123
# zones are not available.
41194124
[query_partial_data: <boolean> | default = false]

pkg/util/validation/limits.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ type Limits struct {
184184
MaxQueryResponseSize int64 `yaml:"max_query_response_size" json:"max_query_response_size"`
185185
MaxCacheFreshness model.Duration `yaml:"max_cache_freshness" json:"max_cache_freshness"`
186186
MaxQueriersPerTenant float64 `yaml:"max_queriers_per_tenant" json:"max_queriers_per_tenant"`
187-
QueryVerticalShardSize int `yaml:"query_vertical_shard_size" json:"query_vertical_shard_size" doc:"hidden"`
187+
QueryVerticalShardSize int `yaml:"query_vertical_shard_size" json:"query_vertical_shard_size"`
188188
QueryPartialData bool `yaml:"query_partial_data" json:"query_partial_data" doc:"nocli|description=Enable to allow queries to be evaluated with data from a single zone, if other zones are not available.|default=false"`
189189

190190
// Parquet Queryable enforced limits.
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
date: 2025-08-12
3+
title: "Efficient Query Parallelism in Cortex with Dynamic Query Splitting"
4+
linkTitle: Dynamic Query Splitting
5+
tags: [ "blog", "cortex", "query", "optimization" ]
6+
categories: [ "blog" ]
7+
projects: [ "cortex" ]
8+
description: >
9+
This article explores the motivations behind adapting dynamic query splitting in Cortex, how the dynamic model works, and how to configure it for more efficient scalable PromQL query execution.
10+
author: Ahmed Hassan ([@afhassan](https://github.com/afhassan))
11+
---
12+
13+
## Introduction
14+
15+
Cortex traditionally relied on **static query splitting** and **static vertical sharding** to optimize the execution of long-range PromQL queries. Static query splitting divides a query into fixed time intervals, while vertical sharding—when applicable—splits the query across subsets of time series. These techniques offered improved parallelism and reduced query latency but were limited by their one-size-fits-all approach. They did not account for differences in query range, lookback behavior, and cardinality—leading to inefficiencies like over-sharding, redundant data fetches, and storage pressure in large or complex queries.
16+
17+
To address those gaps, Cortex introduced **dynamic query splitting** and **dynamic vertical sharding**—two adaptive mechanisms that intelligently adjust how queries are broken down based on query semantics.
18+
19+
This article explores the motivations behind this evolution, how the dynamic model works, and how to configure it for more efficient scalable PromQL query execution.
20+
21+
22+
### Query Splitting
23+
24+
Query splitting breaks a single long-range query into smaller subqueries based on a configured time interval. For example, given this configuration, a 30-day range query will be split into 30 individual 1-day subqueries:
25+
26+
```
27+
query_range:
28+
split_queries_by_interval: 24h
29+
```
30+
31+
These subqueries are processed in parallel by different queriers, and the results are merged at query-frontend before returning to client. This improved performance for long range queries and helped prevent timeouts and resource exhaustion.
32+
33+
### Vertical Sharding
34+
35+
Unlike query splitting, which divides a query over time intervals, vertical sharding divides a query across shards of the time series data itself. Each shard processes only a portion of the matched series, reducing memory usage and computational load per querier. This is especially useful for high-cardinality queries where the number of series can reach hundreds of thousands or more.
36+
37+
```
38+
limits:
39+
query_vertical_shard_size: 4
40+
```
41+
42+
For example, suppose the label selector in the following query `http_requests_total{job="api"}` matches 500,000 distinct time series, each corresponding to different combinations of labels like `instance`, `path`, `status`, and `method`.
43+
44+
```
45+
sum(rate(http_requests_total{job="api"}[5m])) by (instance)
46+
```
47+
48+
Without vertical sharding, a single querier must fetch and aggregate all 500,000 series. With vertical sharding enabled and configured to 4, the query would be split into 4 shards each processing ~125,000 series. The results are finally merged at query frontend before returning to client.
49+
50+
![QuerySplittingAndVerticalSharding](/images/blog/2025/query-splitting-and-vertical-sharding.png)
51+
52+
## Introducing Dynamic Query Splitting
53+
54+
Cortex **dynamic query splitting** was introduced to address the limitations of static configurations. Instead of applying a fixed split interval uniformly across all queries, the dynamic logic computes an optimal split interval per query—as a multiple of the configured base interval—based on query semantics and configurable constraints. If **dynamic vertical sharding** is also enabled, then both split interval and vertical shard size will be dynamically adjusted for every query.
55+
56+
The goal is to maximize query parallelism through both horizontal splitting and vertical sharding, while staying within safe and configurable limits that prevent system overload or inefficiency. This is best explained by how dynamic splitting solves two problems:
57+
58+
### 1. **Queuing and Merge Bottlenecks from Over-Splitting**
59+
60+
While increasing parallelism through splitting and sharding is generally beneficial, it becomes counterproductive when the number of subqueries or shards far exceeds the number of available queriers.
61+
62+
The number of splits when using a fixed interval increases with the query’s time range, which is under the user's control. For example:
63+
64+
* With a static split interval of 24 hours, a 7-day query results in 7 horizontal splits
65+
* If the user increased query range to 100 days, the query is split into 100 horizontal splits.
66+
* If vertical sharding is also enabled and configured to use 5 shards (`query_vertical_shard_size: 5`), each split is further divided—leading to a total of 500 individual shards.
67+
68+
When the number of shards grows too large:
69+
70+
* Backend workers become saturated, causing subqueries to queue and increasing total latency.
71+
* The query-frontend merges hundreds of partial results, which introduces overhead that can outweigh the benefits of parallelism.
72+
* Overall system throughput degrades, especially when multiple large queries are executed concurrently.
73+
74+
To solve this issue, a new configuration `max_shards_per_query` is introduced for the maximum parallelism per query. Given the same example above:
75+
76+
* With a static split interval of 24 hours and `max_shards_per_query` set to 75, a 7-day query still results in 7 splits.
77+
* If the user increased query range to 100 days, the dynamic splitting algorithm will adjust the split interval to be 48 hours, producing 50 horizontal splits—keeping the total within the target of 75 shards.
78+
* If vertical sharding is enabled and configured to use up to 5 shards, the dynamic logic selects the optimal combination of split interval and vertical shards to maximize parallelism without exceeding 75 shards.
79+
* In this case, a 96-hour (4-day) split interval with 3 vertical shards yields exactly 75 total shards—the most efficient combination.
80+
* Note: `enable_dynamic_vertical_sharding` must be set to true; otherwise, only the split interval will be adjusted.
81+
82+
In summary, with dynamic splitting enabled, you can define a target total number of shards, and Cortex will automatically adjust time splitting and vertical sharding to maximize parallelism without crossing that limit.
83+
84+
![QuerySplittingAndVerticalSharding](/images/blog/2025/query-static-and-dynamic-splitting.png)
85+
86+
### 2. **Parallelism Cost with Query Lookback**
87+
88+
In PromQL, some functions like `rate()`, `increase()`, or `max_over_time()` use a **lookback window**, meaning each query must fetch samples from before the evaluation timestamp to execute.
89+
90+
Consider the following query that calculates the maximum container memory usage over a 90-day lookback window:
91+
92+
```
93+
max_over_time(container_memory_usage_bytes{cluster="prod", namespace="payments"}[90d])
94+
```
95+
96+
Suppose this query is evaluated over a 30-day range and static query splitting is configured to split it into 1-day intervals. This produces 30 subqueries, each corresponding to a single day. However, due to the [90d] range vector:
97+
98+
* Each subquery must fetch the full 90 days of historical data to evaluate correctly.
99+
* The same data blocks are repeatedly fetched across all subqueries
100+
* The total duration of data fetched is `query range + (lookback window x total shards)`, which results in 30 + 90 x 30 = 2,730 days.
101+
102+
As a result, store-gateways must handle a large amount of mostly redundant reads, repeatedly fetching data blocks for each subquery. This puts additional pressure on the storage layer, and the cumulative effect slows down query execution and degrades overall system performance. In this case, splitting the query further doesn’t reduce backend load—it amplifies it.
103+
104+
With dynamic splitting, you can define a target `max_fetched_data_duration_per_query`—the maximum cumulative duration of historical data that a single query is allowed to fetch. If the lookback window is long, the algorithm automatically increases the split interval or reduces the vertical shard size to lower the shard count and protect the storage layer.
105+
106+
For example, with `max_fetched_data_duration_per_query` set to 500d:
107+
108+
* A larger split interval of 120-hour (5-day) is used to split the query into 5 splits.
109+
* This is the optimal split interval that results in highest parallelism without crossing the limit of 500 days fetched.
110+
* The total duration fetched from storage layer becomes 30 + 90 x 5 = 480 days—lower than the target of 500 days.
111+
112+
![QuerySplittingAndVerticalSharding](/images/blog/2025/query-static-and-dynamic-splitting-lookback.png)
113+
114+
## How to Configure Dynamic Query Splitting
115+
116+
Dynamic query splitting is configured under the `dynamic_query_splits` section in the `query_range` block of Cortex’s configuration. Keep in mind that it works in conjunction with static `split_queries_by_interval` and `query_vertical_shard_size` which are required to be configured as well:
117+
118+
Dynamic query splitting considers the following configurations:
119+
120+
* `max_shards_per_query:` Defines the maximum number of total shards (horizontal splits × vertical shards) that a single query can generate. If `enable_dynamic_vertical_sharding` is set, the dynamic logic will adjust both the split interval and vertical shard size to find the most effective combination that results in the highest degree of sharding without exceeding this limit.
121+
122+
* `max_fetched_data_duration_per_query:` Sets a target for the maximum total time duration of data that can be fetched across all subqueries. To keep the duration fetched below this target, a larger split interval and/or less vertical sharding is used. This is especially important for queries with long lookback windows, where excessive splitting can lead to redundant block reads, putting pressure on store-gateways and the storage layer.
123+
124+
* `enable_dynamic_vertical_sharding:` When enabled, vertical sharding becomes dynamic per query. Instead of using a fixed shard count, an optimal vertical shard size ranging from 1 (no sharding) to the tenant’s configured `query_vertical_shard_size` will be used.
125+
126+
## Example Configuration
127+
128+
Let's explore how two different queries will be handled given the following configuration:
129+
130+
```
131+
query_range:
132+
split_queries_by_interval: 24h
133+
dynamic_query_splits:
134+
max_shards_per_query: 100
135+
max_fetched_data_duration_per_query: 8760h # 365 day
136+
enable_dynamic_vertical_sharding: true
137+
138+
limits:
139+
query_vertical_shard_size: 4
140+
```
141+
142+
### Query #1
143+
144+
```
145+
sum by (pod) (
146+
rate(container_cpu_usage_seconds_total{namespace="prod"}[1m])
147+
)
148+
```
149+
150+
* **Query time range:** 60 days
151+
* **Lookback window:** 1 minute
152+
153+
Since the query has a short lookback window of 1 min, the total duration of data fetched by each shard is not going to be limiting factor. The limiting factor to consider here is maintaining less than 100 total shards. Both dynamic splitting and dynamic vertical sharding are enabled. Cortex finds the most optimal combination that results in the highest number of shards below 100. In this case:
154+
155+
* **Number of splits by time:** 30 (2 day interval)
156+
* **Vertical shard size:** 3
157+
* **Total shards:** 90
158+
159+
### Query #2
160+
161+
```
162+
sum by (pod) (
163+
max_over_time(container_memory_usage_bytes{namespace="prod"}[30d])
164+
)
165+
```
166+
167+
* **Query time range:** 14 days
168+
* **Lookback window:** 30 days
169+
170+
This query can be split into 14 splits and sharded vertically by 4 resulting in a total of 56 shards, which is below the limit of 100 total shards. However, since each shard is going to have to fetch all 30 days of the lookback window to evaluate in addition to the interval itself, this would result in 56 shards each fetching 31 days of data for a total of 1736 days. This is not optimal and will cause a heavy load on the backend storage layer.
171+
172+
Luckily we configured `max_fetched_data_duration_per_query` to be 365 days. This will limit query sharding to achieve highest parallelism without crossing the duration fetched limit. In this case:
173+
174+
* Number of splits by time: 5 (3 day interval)
175+
* Vertical shard size: 2
176+
* Total shards: 10
177+
178+
The total duration of data fetched for query evaluation is calculated using `(interval + lookback window) x total shards`. In this case `(3 + 30) x 10 = 330` days fetched, which is below our limit of 365 days.
179+
180+
## Conclusion
181+
182+
Dynamic query splitting and vertical sharding make Cortex smarter about how it executes PromQL queries. By adapting to each query's semantics and backend constraints, Cortex avoids the limitations of static configurations—enabling efficient parallelism across diverse query patterns and consistently delivering high performance at scale.
257 KB
Loading
1.65 MB
Loading
571 KB
Loading

0 commit comments

Comments
 (0)