Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions docs/reference/cluster/allocation-explain.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ node.
<5> The decider which led to the `no` decision for the node.
<6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision. In this example, a newly created index has <<indices-get-settings,an index setting>> that requires that it only be allocated to a node named `nonexistent_node`, which does not exist, so the index is unable to allocate.

[[maximum-number-of-retries-exceeded]]
====== Maximum number of retries exceeded

The following response contains an allocation explanation for an unassigned
Expand Down Expand Up @@ -195,17 +196,19 @@ primary shard that has reached the maximum number of allocation retry attempts.
{
"decider": "max_retry",
"decision" : "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-07-30T21:04:12.166Z], failed_attempts[5], failed_nodes[[mEKjwwzLT1yJVb8UxT6anw]], delayed=false, details[failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException], allocation_status[deciders_no]]]"
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-07-30T21:04:12.166Z], failed_attempts[5], failed_nodes[[mEKjwwzLT1yJVb8UxT6anw]], delayed=false, details[failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException], allocation_status[deciders_no]]]"
}
]
}
]
}
----
// NOTCONSOLE

If decider message indicates a transient allocation issue, use
the <<cluster-reroute,cluster reroute>> API to retry allocation.
Elasticsearch queues shard allocation retries in batches. If there are long-running shard
recoveries or a high quantity of shard recoveries occurring within the cluster, this
process may time out for some shards, resulting in `max_retry`. This surfaces infrequently
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true, there's no timeout in play here. You need to get 5 genuine failures in a row before you see this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of changing this to

When Elasticsearch is unable to allocate a shard, it will attempt to retry allocation up to the 
maximum number of retries allowed. After this, Elasticsearch will stop attempting to allocate 
the shard in order to prevent infinite retries which may impact cluster performance. Run the 
<<cluster-reroute,cluster reroute>> API to retry allocation, which will allocate the shard if the 
issue preventing allocation has been resolved.

Are there any tweaks you’d like to make?/Does that seem reasonable?

but is expected to prevent infinite retries which may impact cluster performance. When
encountered, run the <<cluster-reroute,cluster reroute>> API to retry allocation.

[[no-valid-shard-copy]]
====== No valid shard copy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.elasticsearch.cluster.routing.ShardRouting;
import org.elasticsearch.cluster.routing.UnassignedInfo;
import org.elasticsearch.cluster.routing.allocation.RoutingAllocation;
import org.elasticsearch.common.ReferenceDocs;
import org.elasticsearch.common.settings.Setting;

/**
Expand Down Expand Up @@ -72,9 +73,11 @@ private static Decision debugDecision(Decision decision, UnassignedInfo info, in
return Decision.single(
Decision.Type.NO,
NAME,
"shard has exceeded the maximum number of retries [%d] on failed allocation attempts - manually call [%s] to retry, [%s]",
"shard has exceeded the maximum number of retries [%d] on failed allocation attempts - "
+ "manually call [%s] to retry, and for more information, see [%s] [%s]",
maxRetries,
RETRY_FAILED_API,
ReferenceDocs.ALLOCATION_EXPLAIN_MAX_RETRY,
info.toString()
);
} else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ public enum ReferenceDocs {
FORMING_SINGLE_NODE_CLUSTERS,
CIRCUIT_BREAKER_ERRORS,
ALLOCATION_EXPLAIN_NO_COPIES,
ALLOCATION_EXPLAIN_MAX_RETRY,
// this comment keeps the ';' on the next line so every entry above has a trailing ',' which makes the diff for adding new links cleaner
;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,4 @@ X_OPAQUE_ID api-conventions.
FORMING_SINGLE_NODE_CLUSTERS modules-discovery-bootstrap-cluster.html#modules-discovery-bootstrap-cluster-joining
CIRCUIT_BREAKER_ERRORS circuit-breaker-errors.html
ALLOCATION_EXPLAIN_NO_COPIES cluster-allocation-explain.html#no-valid-shard-copy
ALLOCATION_EXPLAIN_MAX_RETRY cluster-allocation-explain.html#maximum-number-of-retries-exceeded

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.