Replies: 2 comments 6 replies
-
|
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@jpt1624 , I think you are confusing host-ha (which reboots a host if needed/possible) and VM-ha which restarts a VM when it can reliably detect that it is down. The crux is that if a host is “down" it cannot be reliably determined that the VM is down. It might be that the host is disconnected and still allows for the VM to change its disk. This why a VM will not be restarted “just” because the host is not addressable. #splitbrain-scenario |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
problem
Hello, I am having issues with getting HA to function with my two KVM hosts. The cluster, the two hosts, and a test virtual machine each have HA enabled.
The KVM hosts have OOB management configured using ipmitool.
For testing, I have a virtual machine with an HA supported policy running on KVM-02. I power off KVM-02 abruptly to see if the virtual machine will automatically migrate over to KVM-01.
What occurs is the following:
KVM-02 is determined to be Disconnected by cloudstack management:
Host {"id":46,"name":"kvm-02","type":"Routing","uuid":"d3b323d6-e3bb-4d06-917a-75fba36a5adf"} has the status [Disconnected].
KVM-02 is then set to the DOWN status (supposedly):
{"id":46,"name":"kvm-02","type":"Routing","uuid":"d3b323d6-e3bb-4d06-917a-75fba36a5adf"} has the status [Down].
KVM-01 then checks connectivity with KVM-02, which also returns a status of DOWN:
Neighbouring Host {"id":43,"name":"kvm-01","type":"Routing","uuid":"025ccefd-1696-43c9-9a2c-e045968d2efa"} returned status [Down] for the investigated Host {"id":46,"name":"kvm-02","type":"Routing","uuid":"d3b323d6-e3bb-4d06-917a-75fba36a5adf"}.
The shared storage volume mounted onto KVM-02 is checked for any recent writes:
Checking VM activity for Host {"id":46,"name":"kvm-02","type":"Routing","uuid":"d3b323d6-e3bb-4d06-917a-75fba36a5adf"} on storage pool [StoragePool {"id":48,"name":"Cloud-KVM-SSD-01","poolType":"NetworkFilesystem","uuid":"f8e97832-44a9-3031-aa1d-0acfc9e32648"}].
Host {"id":46,"name":"kvm-02","type":"Routing","uuid":"d3b323d6-e3bb-4d06-917a-75fba36a5adf"} does not have activity on storage pool [StoragePool {"id":48,"name":"Cloud-KVM-SSD-01","poolType":"NetworkFilesystem","uuid":"f8e97832-44a9-3031-aa1d-0acfc9e32648"}]
Also while these are occurring, the API states that the status for KVM-02 is UP:
After about 10-15 minutes, we progress to the ALERT state for KVM-02. I am not sure why it takes this many attempts because we have set this condition in the settings for 5 checks:
At the ALERT state, now the HA task tries to power OFF KVM-02 (assuming to prevent split brain prior to moving the virtual machines over):
This command fails because KVM-02 is already OFF.
Cloudstack will continue to try to power KVM-02 off until I manually issue the OOB power up command. Cloudstack's power OFF command then will work. When this happens we progress to marking KVM-02 as DOWN:
Here are the investigators configured:
versions
Cloudstack: 4.22.0.0
KVM-01: 4.22.0.0
KVM-02: 4.22.0.0
The steps to reproduce the bug
What to do about it?
Not sure if my configuration is incorrect or underlying issue is present. The behavior is confusing. Please let me know if I can provide anything else.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions