Skip to content

Stop fencing actions when node gets healthy #159

@slintes

Description

@slintes

There is a good chance that reboots solves issues on the node, and the node gets healthy again. NHC will delete the SNR CR in that case.

When SNR assumes the node rebooted by waiting some time, it just continues fencing by deleting resources or adding the out-of-service taint though. This isn't a big issue, because there shouldn't be any workloads running after the reboot (because of the "normal" NoExecute taint).

However, it probably makes sense to skip this step, because there is no need anymore to delete the remaining pods which tolerate the NoExecute taint on a healthy node. Probably we can switch directly to the "FencingCompleted" code branch, which does the usual cleanup, like removing that NoExecute taint.

@k-keiichi-rh @mshitrit

This was triggered by the discussion here: medik8s/fence-agents-remediation#92 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions