Skip to content

CloudStack fails to start more VMs #10205

@akrasnov-drv

Description

@akrasnov-drv

Discussed in #10184

Originally posted by akrasnov-drv January 14, 2025
Hi,

I'm struggling to make CloudStack 4.20.0.0 properly start KVM VMs on Ubuntu 22.

We have isolated network over VLAN.
CloudStack manages to start single VM and to add several more. But when I ask to start more (e.g. 10-30), Cloudstack starts behaving weird.
New VMs produce different errors, then Cloudstack becomes slow, does not clean resources, and at the end stays with number of VMs in Starting state.

I have 5 KVM servers connected, each able to handle 30 VMs alone (in KVM without Cloudstack). VMs use local server storage. I do not see any resource problem.
I tried to debug the issue, and looks like virtual router stops working properly. I found in its log that it restarts managing script at some point, still part of VMs do not get proper network config. Static NAT enable also returns errors.
Error while enabling static nat. Ip Id: 14
Expunge for VMs then also hangs.
In addition sometimes I see KVM hosts stop communicating with management, and stop writing to their local logs.

To recover I need to restart management, delete virtual router and clean stuck resources, sometimes directly in mysql db. Agent restart is also sometimes needed.
Any help to understand and fix the problem is highly appreciated.
I'll provide logs or other info on request.

Thanks,
Alex.

To summon

- under some load part of VMs stays in Starting state, and UI becomes unresponsive, libvirt restart revives UI and expunging of VMs
- part of VMs that manage to start do not get IPs
- most fail to get static nat configured (I have enough free public IPs)
- at the end primary VR fails but backup one is not promoted to primary for some 30-60 min

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions