Skip to content

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Oct 6, 2025

In stateless, index and search shards are distinct and must be allocated to nodes of corresponding types. Therefore the shard limit validation should be performed for them separately to avoid one shard type taking more quota than expected, similar to the separation between regular and frozen shards.

Resolves: ES-12884

Comment on lines 255 to 259
public enum ResultGroup {
NORMAL(NORMAL_GROUP),
FROZEN(FROZEN_GROUP),
INDEX("index"),
SEARCH("search");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is bigger and more involved than I initially expected because the current shard limit validation has a hard-coded 2-member group for "normal" and "frozen" indices. The group is also decided by an index level setting. None of these makes sense in Stateless.

The PR introduces ResultGroup so that the actual groups can be picked based on the setup. It also helps detaching the grouping from the index level setting. Overall it promotes the Group concept (was a String previously) which in turns helps reusing the existing logics (many of them are based on the group in use).

Please let me know if it makes sense. Happy to provide more clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using SPI to provide the ResultGroups so serverless can override it and we avoid putting knowledge of those things in the core product?

It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.

I guess ResultGroup would have to be an interface then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question and I did briefly consider it initially. Didn't go with it because (1) relatively larger effort (2) there is a bit untangle needed for ShardsCapacityHealthIndicatorService (3) a bit easier to test everything together with existing testing code.

Another reason is that focus of this PR is a bug fix on the stateless side and SPI feels more of an optimization. So I think pursuing SPI separately might be a better option. I can raise a separate ticket for it. Let me know if this makes sense or you'd prefer we pursue it as part of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wouldn't block this on account of the SPI, I'll leave it up to you whether you think it's worth a follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this PR open till tomorrow. If no objection by then, I'll log a ticket for future optimization with SPI and merge this PR so that we can get the bug fix part shipped.

@ywangd ywangd added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 7, 2025
@ywangd ywangd marked this pull request as ready for review October 7, 2025 01:37
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nits and questions about whether it's worth putting the serverless stuff in the serverless codebase

final var resultGroups = applicableResultGroups(isStateless);
final Map<ResultGroup, Integer> shardsToCreatePerGroup = new HashMap<>();

// TODO: we can short circuit when indindicesToOpenices is empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo indindicesToOpenices, also did you mean to act on this TODO before merging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO is for future since I intend to keep the current behaviour as is. Fixed the typo in 241bf1b

* - otherwise -> returns the Result of checking the limits for _frozen_ nodes
* - Check limits for _normal_ nodes
* - If there's no room -> return the Result for _normal_ nodes (fail-fast)
* - otherwise -> returns the Result of checking the limits for _frozen_ nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this javadoc probably needs to be generalised

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep generalized in 241bf1b

Comment on lines 255 to 259
public enum ResultGroup {
NORMAL(NORMAL_GROUP),
FROZEN(FROZEN_GROUP),
INDEX("index"),
SEARCH("search");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using SPI to provide the ResultGroups so serverless can override it and we avoid putting knowledge of those things in the core product?

It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.

I guess ResultGroup would have to be an interface then

case FROZEN -> nodeCount(discoveryNodes, ShardLimitValidator::hasFrozen);
case INDEX -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.INDEX_ROLE.roleName()));
case SEARCH -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.SEARCH_ROLE.roleName()));
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could make these abstract in the enum class and put the implementations on the individual declarations? that would avoid the need for the switch. Same as below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done in 035f2ef

* @return The total number of new shards to be created for this group.
*/
public int newShardsTotal(Settings indexSettings) {
final boolean frozen = FROZEN_GROUP.equals(INDEX_SETTING_SHARD_LIMIT_GROUP.get(indexSettings));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is a bit tricky to read, perhaps inFrozenLimitGroup instead of frozen or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to isFrozenIndex. I didn't include group since the enum (this) in the context is the group.

+ ReferenceDocs.MAX_SHARDS_PER_NODE;
}

public enum ResultGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps LimitGroup? I'm not entirely clear why "result" is in the name? perhaps I'm missing something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to LimitGroup which seems to be overall better, see 0dcd553. The Result part was taken from the validation Result inner class name which has a group field of the enum type (previously a String).

@ywangd
Copy link
Member Author

ywangd commented Oct 8, 2025

@elasticmachine update branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants