-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Separate shard limit validation for index and search tiers #136063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Separate shard limit validation for index and search tiers #136063
Conversation
…ex-search-for-shard-limit-validator
public enum ResultGroup { | ||
NORMAL(NORMAL_GROUP), | ||
FROZEN(FROZEN_GROUP), | ||
INDEX("index"), | ||
SEARCH("search"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is bigger and more involved than I initially expected because the current shard limit validation has a hard-coded 2-member group for "normal" and "frozen" indices. The group is also decided by an index level setting. None of these makes sense in Stateless.
The PR introduces ResultGroup
so that the actual groups can be picked based on the setup. It also helps detaching the grouping from the index level setting. Overall it promotes the Group concept (was a String
previously) which in turns helps reusing the existing logics (many of them are based on the group in use).
Please let me know if it makes sense. Happy to provide more clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using SPI to provide the ResultGroup
s so serverless can override it and we avoid putting knowledge of those things in the core product?
It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.
I guess ResultGroup
would have to be an interface then
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits and questions about whether it's worth putting the serverless stuff in the serverless codebase
final var resultGroups = applicableResultGroups(isStateless); | ||
final Map<ResultGroup, Integer> shardsToCreatePerGroup = new HashMap<>(); | ||
|
||
// TODO: we can short circuit when indindicesToOpenices is empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo indindicesToOpenices
, also did you mean to act on this TODO before merging?
* - otherwise -> returns the Result of checking the limits for _frozen_ nodes | ||
* - Check limits for _normal_ nodes | ||
* - If there's no room -> return the Result for _normal_ nodes (fail-fast) | ||
* - otherwise -> returns the Result of checking the limits for _frozen_ nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this javadoc probably needs to be generalised
public enum ResultGroup { | ||
NORMAL(NORMAL_GROUP), | ||
FROZEN(FROZEN_GROUP), | ||
INDEX("index"), | ||
SEARCH("search"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using SPI to provide the ResultGroup
s so serverless can override it and we avoid putting knowledge of those things in the core product?
It may be pedantic and not worth the effort, but it looks generalised enough that it could be done. And we do do it for some other things.
I guess ResultGroup
would have to be an interface then
case FROZEN -> nodeCount(discoveryNodes, ShardLimitValidator::hasFrozen); | ||
case INDEX -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.INDEX_ROLE.roleName())); | ||
case SEARCH -> nodeCount(discoveryNodes, node -> node.hasRole(DiscoveryNodeRole.SEARCH_ROLE.roleName())); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could make these abstract in the enum class and put the implementations on the individual declarations? that would avoid the need for the switch. Same as below.
* @return The total number of new shards to be created for this group. | ||
*/ | ||
public int newShardsTotal(Settings indexSettings) { | ||
final boolean frozen = FROZEN_GROUP.equals(INDEX_SETTING_SHARD_LIMIT_GROUP.get(indexSettings)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this is a bit tricky to read, perhaps inFrozenLimitGroup
instead of frozen
or something?
+ ReferenceDocs.MAX_SHARDS_PER_NODE; | ||
} | ||
|
||
public enum ResultGroup { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps LimitGroup
? I'm not entirely clear why "result" is in the name? perhaps I'm missing something
In stateless, index and search shards are distinct and must be allocated to nodes of corresponding types. Therefore the shard limit validation should be performed for them separately to avoid one shard type taking more quota than expected, similar to the separation between regular and frozen shards.
Resolves: ES-12884