Separate shard limit validation for index and search tiers #136063

ywangd · 2025-10-06T22:32:03Z

In stateless, index and search shards are distinct and must be allocated to nodes of corresponding types. Therefore the shard limit validation should be performed for them separately to avoid one shard type taking more quota than expected, similar to the separation between regular and frozen shards.

Resolves: ES-12884

…ex-search-for-shard-limit-validator

ywangd · 2025-10-07T01:26:49Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

+    public enum ResultGroup {
+        NORMAL(NORMAL_GROUP),
+        FROZEN(FROZEN_GROUP),
+        INDEX("index"),
+        SEARCH("search");


The PR is bigger and more involved than I initially expected because the current shard limit validation has a hard-coded 2-member group for "normal" and "frozen" indices. The group is also decided by an index level setting. None of these makes sense in Stateless.

The PR introduces ResultGroup so that the actual groups can be picked based on the setup. It also helps detaching the grouping from the index level setting. Overall it promotes the Group concept (was a String previously) which in turns helps reusing the existing logics (many of them are based on the group in use).

Please let me know if it makes sense. Happy to provide more clarification.

Is it worth using SPI to provide the ResultGroups so serverless can override it and we avoid putting knowledge of those things in the core product?

It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.

I guess ResultGroup would have to be an interface then

elasticsearchmachine · 2025-10-07T01:37:41Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

nicktindall

LGTM, just some nits and questions about whether it's worth putting the serverless stuff in the serverless codebase

nicktindall · 2025-10-07T03:08:30Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

+        final var resultGroups = applicableResultGroups(isStateless);
+        final Map<ResultGroup, Integer> shardsToCreatePerGroup = new HashMap<>();
+
+        // TODO: we can short circuit when indindicesToOpenices is empty


Nit: typo indindicesToOpenices, also did you mean to act on this TODO before merging?

nicktindall · 2025-10-07T03:12:22Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

-     *   - otherwise -> returns the Result of checking the limits for _frozen_ nodes
+     * - Check limits for _normal_ nodes
+     * - If there's no room -> return the Result for _normal_ nodes (fail-fast)
+     * - otherwise -> returns the Result of checking the limits for _frozen_ nodes


Nit: this javadoc probably needs to be generalised

nicktindall · 2025-10-07T03:15:17Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

+    public enum ResultGroup {
+        NORMAL(NORMAL_GROUP),
+        FROZEN(FROZEN_GROUP),
+        INDEX("index"),
+        SEARCH("search");


Is it worth using SPI to provide the ResultGroups so serverless can override it and we avoid putting knowledge of those things in the core product?

It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.

I guess ResultGroup would have to be an interface then

nicktindall · 2025-10-07T03:17:00Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

+                case FROZEN -> nodeCount(discoveryNodes, ShardLimitValidator::hasFrozen);
+                case INDEX -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.INDEX_ROLE.roleName()));
+                case SEARCH -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.SEARCH_ROLE.roleName()));
+            };


Nit: could make these abstract in the enum class and put the implementations on the individual declarations? that would avoid the need for the switch. Same as below.

nicktindall · 2025-10-07T03:19:02Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

+         * @return The total number of new shards to be created for this group.
+         */
+        public int newShardsTotal(Settings indexSettings) {
+            final boolean frozen = FROZEN_GROUP.equals(INDEX_SETTING_SHARD_LIMIT_GROUP.get(indexSettings));


Nit: this is a bit tricky to read, perhaps inFrozenLimitGroup instead of frozen or something?

nicktindall · 2025-10-07T03:21:43Z

server/src/main/java/org/elasticsearch/indices/ShardLimitValidator.java

            + ReferenceDocs.MAX_SHARDS_PER_NODE;
    }

+    public enum ResultGroup {


Nit: perhaps LimitGroup? I'm not entirely clear why "result" is in the name? perhaps I'm missing something

ywangd added 7 commits October 6, 2025 14:04

Initial refactor for more flexible groupping in ShardLimitsValidator

a908117

refactor ShardsCapacityHealthIndicatorService

fb0bc24

more refactoring

e990dcc

expand ShardLimitValidatorTests

01ba22f

add stateless tests for ShardsCapacityHealthIndicatorService

03ebc8e

tweak and fix

8972387

Merge remote-tracking branch 'origin/main' into ES-12884-separate-ind…

5f1df68

…ex-search-for-shard-limit-validator

elasticsearchmachine added the v9.3.0 label Oct 6, 2025

ywangd added 2 commits October 7, 2025 10:13

fix tests

dfb65a7

fix test

36e8ccb

ywangd commented Oct 7, 2025

View reviewed changes

ywangd added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 7, 2025

ywangd marked this pull request as ready for review October 7, 2025 01:37

ywangd requested review from nicktindall, DiannaHohensee and henningandersen October 7, 2025 01:37

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 7, 2025

nicktindall approved these changes Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate shard limit validation for index and search tiers #136063

Separate shard limit validation for index and search tiers #136063

ywangd commented Oct 6, 2025 •

edited

Loading

Uh oh!

ywangd Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

elasticsearchmachine commented Oct 7, 2025

Uh oh!

nicktindall left a comment

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

nicktindall Oct 7, 2025

Uh oh!

Uh oh!

Separate shard limit validation for index and search tiers #136063

Are you sure you want to change the base?

Separate shard limit validation for index and search tiers #136063

Conversation

ywangd commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Oct 7, 2025

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ywangd commented Oct 6, 2025 •

edited

Loading