KVM cluster with NFS primary storage – VM HA not working when host is powered down #11674

akoskuczi-bw · 2025-09-12T12:57:11Z

akoskuczi-bw
Sep 12, 2025

problem

In a KVM cluster with NFS primary storage, VM HA does not work when a host is powered down.

The host status transitions to Down, HA state shows Fenced.
VMs from the powered-down host are not restarted on other available hosts in the cluster.
Both Host HA and VM HA are enabled.
OOB driver: IPMI.

Expected behavior

VMs from the failed host should be restarted on other available hosts in the cluster.

Actual behavior

Host goes to Down and HA state Fenced.
VMs are not started elsewhere.
Management server logs show a NoTransitionException.

Relevant log snippet

WARN [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) (logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for event=[Ineligible] for host Host {"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"} with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Fenced via Ineligible
at com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
at org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
at org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
at org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)

versions

Environment

CloudStack version: 4.20.1.0
Hypervisor: KVM
Primary storage: NFS
HA settings: Host HA enabled, VM HA enabled, OOB driver = IPMI

The steps to reproduce the bug

1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).
2. Power off a host that runs VMs.
3. Observe host and VM states in the management server.

What to do about it?

No response

2025-09-12T12:57:15Z

boring-cyborg[bot]
bot Sep 12, 2025

Thanks for opening your first issue here! Be sure to follow the issue template!

0 replies

kiranchavala · 2025-09-15T04:36:22Z

kiranchavala
Sep 15, 2025
Collaborator

@akoskuczi-bw

Could you please try the steps mentioned in this link

#10477 (comment)

cc @rajujith

0 replies

akoskuczi-bw · 2025-09-15T07:31:38Z

akoskuczi-bw
Sep 15, 2025
Author

VM-HA functions properly, but only when HOST-HA is disabled. When HOST-HA is also enabled on the hosts, the log contains the entries mentioned above, and the VMs fail to start on the healthy hosts even after several hours of waiting.

1 reply

DaanHoogland Sep 18, 2025
Collaborator

It would be nice if we can define however these should work nicely together. This has never been a focus of anybody’s attention.

DaanHoogland · 2025-09-17T13:39:09Z

DaanHoogland
Sep 17, 2025
Collaborator

VM-HA functions properly, but only when HOST-HA is disabled. When HOST-HA is also enabled on the hosts, the log contains the entries mentioned above, and the VMs fail to start on the healthy hosts even after several hours of waiting.

This right, @akoskuczi-bw , VM-HA and host-HA do not play nice together. This is a known issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KVM cluster with NFS primary storage – VM HA not working when host is powered down #11674

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

KVM cluster with NFS primary storage – VM HA not working when host is powered down #11674

Uh oh!

akoskuczi-bw Sep 12, 2025

problem

Expected behavior

Actual behavior

Relevant log snippet

versions

Environment

The steps to reproduce the bug

What to do about it?

Replies: 4 comments · 1 reply

Uh oh!

boring-cyborg[bot] bot Sep 12, 2025

Uh oh!

Uh oh!

kiranchavala Sep 15, 2025 Collaborator

Uh oh!

akoskuczi-bw Sep 15, 2025 Author

Uh oh!

DaanHoogland Sep 18, 2025 Collaborator

Uh oh!

DaanHoogland Sep 17, 2025 Collaborator

akoskuczi-bw
Sep 12, 2025

Replies: 4 comments 1 reply

boring-cyborg[bot]
bot Sep 12, 2025

kiranchavala
Sep 15, 2025
Collaborator

akoskuczi-bw
Sep 15, 2025
Author

DaanHoogland Sep 18, 2025
Collaborator

DaanHoogland
Sep 17, 2025
Collaborator