-
Notifications
You must be signed in to change notification settings - Fork 44
Description
What would you like to be added:
MCM should by default turn the automatic recovery(https://aws.amazon.com/about-aws/whats-new/2022/03/amazon-ec2-default-automatic-recovery/) off for an instance. automatic-recovery is a feature offered by AWS which will recover the instance on a new host in case of host failure, with the same instance id , volume attached.
Why is this needed:
Currently MCM itself has a health check mechanism where it terminates a machine in case its unhealthy(kubelet not responding or some other conditions) for over healthTimeout(by default 10 min). This means we have two health check actions which could race against each other.
If AWS autorecovery , happens before health-check , then its fine
But if it takes longer time (means the instance is still in transfer mode from one instance to other, volumes are detaching) then MCM recovery would kick in and it'll delete the instance on new host to start a new instance all together , leading to detachment of volumes again and a longer recovery which is undesirable.