KVM cluster with NFS primary storage – VM HA not working when host is powered down #11674
Replies: 4 comments 1 reply
-
|
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
|
Could you please try the steps mentioned in this link cc @rajujith |
Beta Was this translation helpful? Give feedback.
-
|
VM-HA functions properly, but only when HOST-HA is disabled. When HOST-HA is also enabled on the hosts, the log contains the entries mentioned above, and the VMs fail to start on the healthy hosts even after several hours of waiting. |
Beta Was this translation helpful? Give feedback.
-
This right, @akoskuczi-bw , VM-HA and host-HA do not play nice together. This is a known issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
problem
In a KVM cluster with NFS primary storage, VM HA does not work when a host is powered down.
Expected behavior
VMs from the failed host should be restarted on other available hosts in the cluster.
Actual behavior
Downand HA stateFenced.NoTransitionException.Relevant log snippet
WARN [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) (logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for event=[Ineligible] for host Host {"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"} with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Fenced via Ineligible
at com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
at org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
at org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
at org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
versions
Environment
The steps to reproduce the bug
1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).
2. Power off a host that runs VMs.
3. Observe host and VM states in the management server.
What to do about it?
No response
Beta Was this translation helpful? Give feedback.
All reactions