Stop fencing actions when node gets healthy

There is a good chance that reboots solves issues on the node, and the node gets healthy again. NHC will delete the SNR CR in that case.

When SNR assumes the node rebooted by waiting some time, it just continues fencing by deleting resources or adding the out-of-service taint though. This isn't a big issue, because there shouldn't be any workloads running after the reboot (because of the "normal" NoExecute taint).

However, it probably makes sense to skip this step, because there is no need anymore to delete the remaining pods which tolerate the NoExecute taint on a healthy node. Probably we can switch directly to the "FencingCompleted" code branch, which does the usual cleanup, like removing that NoExecute taint.

@k-keiichi-rh @mshitrit 

This was triggered by the discussion here: https://github.com/medik8s/fence-agents-remediation/pull/92#issuecomment-1783774452

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop fencing actions when node gets healthy #159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stop fencing actions when node gets healthy #159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions