You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Limit number of allocation explanations in shards_availability health indicator
We currently compute the shard allocation explanation for every
unassigned shard (primaries and replicas) in the health report API when
`verbose` is `true`, which includes the periodic health logs. Computing
the shard allocation explanation of a shard is quite expensive in large
clusters. Therefore, when there are lots of unassigned shards,
`ShardsAvailabilityHealthIndicatorService` can take a long time to
complete - we've seen cases of 2 minutes with 40k unassigned shards.
To avoid the runtime of `ShardsAvailabilityHealthIndicatorService`
scaling linearly with the number of unassigned shards (times the size of
the cluster), we limit the number of allocation explanations we compute
to `maxAffectedResourcesCount`, which comes from the `size` parameter of
the `_health_report` API and currently defaults to `1000` - a follow-up
PR will address the high default size. This significantly reduces the
runtime of this health indicator and avoids the periodic health logs
from overlapping.
A downside of this change is that the returned list of diagnoses may be
incomplete. For example, if the `size` parameter is set to `10`, and the
first 10 shards are unassigned due to reason `X` and the remaining
unassigned shards due to reason `Y`, only reason `X` will be returned in
the health API. We accept this downside as we expect that there are
generally not many different diagnoses relevant - if more than `size`
shards are unassigned, they're likely all unassigned due to the same
reason. Users can always increase `size` and/or manually call the
allocation explain API to get more detailed information.
Copy file name to clipboardExpand all lines: server/src/main/java/org/elasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorService.java
+46-18Lines changed: 46 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -166,17 +166,18 @@ public String name() {
166
166
* primary and replica availability, providing the color, diagnosis, and
167
167
* messages about the available or unavailable shards in the cluster.
168
168
* @param metadata Metadata for the cluster
169
+
* @param maxAffectedResourcesCount Max number of affect resources to return
169
170
* @return A new ShardAllocationStatus that has not yet been filled.
Copy file name to clipboardExpand all lines: server/src/test/java/org/elasticsearch/cluster/routing/allocation/shards/ShardsAvailabilityHealthIndicatorServiceTests.java
+32-36Lines changed: 32 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -361,7 +361,7 @@ public void testAllReplicasUnassigned() {
0 commit comments