-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Make the health node available even if all nodes are marked for shutdown (#92193) #112825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the health node available even if all nodes are marked for shutdown (#92193) #112825
Conversation
…own (elastic#92193) This fixes the behavior of the health node not being available as soon as all nodes are marked for shutdown, but still running. A cluster without an operating health node will return "No disk usage data" through its health API. This introduces a cluster setting to enable this behavior, with the default being the previous behavior. I also had to modify the part of the code that immediately evicts any running tasks as soon as a node is marked for shutdown, leading to an endless loop of eviction -> reassignment -> eviction -> ... Fixes elastic#92193
Documentation preview: |
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Pinging @elastic/es-data-management (Team:Data Management) |
buildkite test this |
@elasticmachine update branch |
@dakrone @andreidan Could you have a look at this? Our client is asking for an update on this feature. I am relatively new to open source contributions. If there is anything that I still need to do for this to go through, feel free to let me know, I am happy to keep working on it. |
Hi @nmesot , thank you for picking up this issue. I will start the review. |
For certain lightweight, persistent tasks (like the health node), it makes sense to not | ||
have this limitation. | ||
|
||
NOTE: For the health node for example, this provides the benefit of having accurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this note is needed here; better add this information to the changelog.
private final AtomicReference<HealthNode> currentTask = new AtomicReference<>(); | ||
private final ClusterStateListener taskStarter; | ||
private final ClusterStateListener shutdownListener; | ||
private Boolean allowLightweightAssignmentsToNodesShuttingDown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a Boolean
and not a boolean
?
} | ||
|
||
private static boolean isEveryNodeShuttingDown(ClusterState clusterState, ClusterChangedEvent event) { | ||
return clusterState.nodes().getAllNodes().stream().allMatch(dn -> event.state().metadata().nodeShutdowns().contains(dn.getId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using clusterState form clusterService.state() when you already have event.state()? Am I missing something here?
Looks like a great contribution overall, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nmesot , thank you again for putting this effort and opening a very good PR. I have left some comments that hopefully will relax the complexity a bit but it looks really good.
public static final Setting<Boolean> CLUSTER_TASKS_ALLOCATION_ALLOW_LIGHTWEIGHT_ASSIGNMENTS_TO_NODES_SHUTTING_DOWN_SETTING = Setting | ||
.boolSetting( | ||
"cluster.persistent_tasks.allocation.allow_lightweight_assignments_to_nodes_shutting_down", | ||
false, | ||
Setting.Property.Dynamic, | ||
Setting.Property.NodeScope | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is safe enough to not add a switch. Each task that defines itself as lightweight should ensure that it is safe to run always like that or that it needs the flag. What do you think?
// nodes are available. An example of a lightweight task is the HealthNodeTask. | ||
// This behavior needs to be enabled with the CLUSTER_TASKS_ALLOCATION_ALLOW_LIGHTWEIGHT_ASSIGNMENTS_TO_NODES_SHUTTING_DOWN_SETTING | ||
// setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is clear enough without the example. Someone can always check which tasks implement the isLightweight
.
final List<DiscoveryNode> allNodes = currentState.nodes().stream().toList(); | ||
final List<DiscoveryNode> candidateNodes = allNodes.stream() | ||
.filter(dn -> currentState.metadata().nodeShutdowns().contains(dn.getId()) == false) | ||
.collect(Collectors.toCollection(ArrayList::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use .toList()
here too.
.filter(dn -> currentState.metadata().nodeShutdowns().contains(dn.getId()) == false) | ||
.collect(Collectors.toCollection(ArrayList::new)); | ||
|
||
if (candidateNodes.isEmpty() && allNodes.isEmpty() == false) { // all nodes are shutting down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating a new collection allNodes
is not necessary. I think currentState.nodes().getNodes().isEmpty()
will not create new objects and therefore be more efficient.
* {@link PersistentTasksClusterService#CLUSTER_TASKS_ALLOCATION_ALLOW_LIGHTWEIGHT_ASSIGNMENTS_TO_NODES_SHUTTING_DOWN_SETTING} | ||
* needs to be set for this behavior to apply. | ||
*/ | ||
default boolean isLightweight() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing this term was inspired by the original issue. However, this term is not established and it was more used to describe the health node task. Since we do not have any other usage for this method, I suggest we give a more specific name, something along the lines of keepRunningDuringShutdown
or canRunOnShuttingDownNodes
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ for canRunOnShuttingDownNodes
This fixes the behavior of the health node not being available as soon as all nodes are marked for shutdown, but still running. A cluster without an operating health node will return "No disk usage data" through its health API.
This introduces a cluster setting to enable this behavior, with the default being the previous behavior.
I also had to modify the part of the code that immediately evicts any running tasks as soon as a node is marked for shutdown, leading to an endless loop of eviction -> reassignment -> eviction -> ...
Fixes #92193