Skip to content

Turn automatic-recovery off by default #94

@himanshu-kun

Description

@himanshu-kun

What would you like to be added:
MCM should by default turn the automatic recovery(https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-ec2-default-automatic-recovery/) off for an instance. automatic-recovery is a feature offered by AWS which will recover the instance on a new host in case of host failure, with the same instance id , volume attached.

Why is this needed:
Currently MCM itself has a health check mechanism where it terminates a machine in case its unhealthy(kubelet not responding or some other conditions) for over healthTimeout(by default 10 min). This means we have two health check actions which could race against each other.
If AWS autorecovery , happens before health-check , then its fine
But if it takes longer time (means the instance is still in transfer mode from one instance to other, volumes are detaching) then MCM recovery would kick in and it'll delete the instance on new host to start a new instance all together , leading to detachment of volumes again and a longer recovery which is undesirable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/performancePerformance (across all domains, such as control plane, networking, storage, etc.) relatedkind/discussionDiscussion (engaging others in deciding about multiple options)kind/enhancementEnhancement, improvement, extensionkind/testTestlifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.priority/5Priority (lower number equals higher priority)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions