Skip to content

Commit 1cfc30f

Browse files
authored
Evenly spread queriers across available nodes (grafana#6415)
* Evenly spread queriers across available nodes * Fix lint issue * Add entry to the CHANGELOG and the Upgrade Guide * Make topology spread configurable * Apply CR feedback
1 parent b4e6c59 commit 1cfc30f

File tree

4 files changed

+26
-2
lines changed

4 files changed

+26
-2
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
## Main
22

3+
* [6415](https://github.com/grafana/loki/pull/6415) **salvacorts** Evenly spread queriers across kubernetes nodes.
34
* [6410](https://github.com/grafana/loki/pull/6410) **MichelHollands**: Add support for per tenant delete API access enabling.
45
* [6372](https://github.com/grafana/loki/pull/6372) **splitice**: Add support for numbers in JSON fields.
56
* [6105](https://github.com/grafana/loki/pull/6105) **rutgerke** Export metrics for the Promtail journal target.

docs/sources/upgrading/_index.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ The output is incredibly verbose as it shows the entire internal config struct u
3333

3434
### Loki
3535

36+
#### Evenly spread queriers across kubernetes nodes
37+
38+
We now evenly spread queriers across the available kubernetes nodes, but allowing more than one querier to be scheduled into the same node.
39+
If you want to run at most a single querier per node, set `$._config.querier.use_topology_spread` to false.
40+
3641
#### Implementation of unwrapped `rate` aggregation changed
3742

3843
The implementation of the `rate()` aggregation function changed back to the previous implemention prior to [#5013](https://github.com/grafana/loki/pulls/5013).

production/ksonnet/loki/config.libsonnet

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,14 @@
4747
// A higher value will lead to a querier trying to process more requests than there are available
4848
// cores and will result in scheduling delays.
4949
concurrency: 4,
50+
51+
// If use_topology_spread is true, queriers can run on nodes already running queriers but will be
52+
// spread through the available nodes using a TopologySpreadConstraints with a max skew
53+
// of topology_spread_max_skew.
54+
// See: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
55+
// If use_topology_spread is false, queriers will not be scheduled on nodes already running queriers.
56+
use_topology_spread: true,
57+
topology_spread_max_skew: 1,
5058
},
5159

5260
queryFrontend: {

production/ksonnet/loki/querier.libsonnet

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ local k = import 'ksonnet-util/kausal.libsonnet';
2626
]) else {},
2727

2828
local deployment = k.apps.v1.deployment,
29+
local topologySpreadConstraints = k.core.v1.topologySpreadConstraint,
2930

3031
querier_deployment: if !$._config.stateful_queriers then
3132
deployment.new('querier', 3, [$.querier_container]) +
@@ -35,9 +36,18 @@ local k = import 'ksonnet-util/kausal.libsonnet';
3536
$._config.overrides_configmap_mount_name,
3637
$._config.overrides_configmap_mount_path,
3738
) +
38-
k.util.antiAffinity +
3939
deployment.mixin.spec.strategy.rollingUpdate.withMaxSurge(5) +
40-
deployment.mixin.spec.strategy.rollingUpdate.withMaxUnavailable(1)
40+
deployment.mixin.spec.strategy.rollingUpdate.withMaxUnavailable(1) +
41+
if $._config.querier.use_topology_spread then
42+
deployment.spec.template.spec.withTopologySpreadConstraints(
43+
// Evenly spread queriers among available nodes.
44+
topologySpreadConstraints.labelSelector.withMatchLabels({ name: 'querier' }) +
45+
topologySpreadConstraints.withTopologyKey('kubernetes.io/hostname') +
46+
topologySpreadConstraints.withWhenUnsatisfiable('ScheduleAnyway') +
47+
topologySpreadConstraints.withMaxSkew($._config.querier.topology_spread_max_skew),
48+
)
49+
else
50+
k.util.antiAffinity
4151
else {},
4252

4353
// PVC for queriers when running as statefulsets

0 commit comments

Comments
 (0)