Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/137306.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 137306
summary: Shard started reroute high priority
area: Allocation
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -829,7 +829,7 @@ private static boolean assertStartedIndicesHaveCompleteTimestampRanges(ClusterSt
public void clusterStatePublished(ClusterState newClusterState) {
rerouteService.reroute(
"reroute after starting shards",
Priority.NORMAL,
Priority.HIGH,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check whether there is unassigned shard before promote it to High priority? I'd be ok to have it as Urgent if there is unassigned shards. But we can take one step at a time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to do so, for a couple reasons:

  1. Simplicity.
  2. Seems safe enough - shard started is not that frequent and is batched - and so are the reroutes.
  3. Relocations off a shutting down node could also run into this and thus delay vacating a node.
  4. Every shard initialization has some time on the data node where the cluster can attend to other things before shard started comes back (ofc assuming things otherwise work well).

But happy to change this if you find it important.

I prefer to only go to HIGH to avoid bumping the priority too much until we have evidence we need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suggesting that mostly trying to see whether we can bump it higher, e.g. Urgent, so that it does not get blocked by put-mappings requests. It is something that we observed a few times in production clusters. But if we are sticking to High, the conditional priority is probably not entirely necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I still think conditional urgent is useful. But we can iterate on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH is sufficient to avoid getting blocked by a stream of other HIGH priority requests such as put-mapping ones, because all the HIGH tasks run in submission order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err I mis-remembered put-mapping to be URGENT. Thanks for explaining.

ActionListener.wrap(
r -> logger.trace("reroute after starting shards succeeded"),
e -> logger.debug("reroute after starting shards failed", e)
Expand Down