-
Notifications
You must be signed in to change notification settings - Fork 28
Description
There is a good chance that reboots solves issues on the node, and the node gets healthy again. NHC will delete the SNR CR in that case.
When SNR assumes the node rebooted by waiting some time, it just continues fencing by deleting resources or adding the out-of-service taint though. This isn't a big issue, because there shouldn't be any workloads running after the reboot (because of the "normal" NoExecute taint).
However, it probably makes sense to skip this step, because there is no need anymore to delete the remaining pods which tolerate the NoExecute taint on a healthy node. Probably we can switch directly to the "FencingCompleted" code branch, which does the usual cleanup, like removing that NoExecute taint.
This was triggered by the discussion here: medik8s/fence-agents-remediation#92 (comment)