Skip to content

Reconciliation stops working as soon as OpenStackServer is in ERROR state #2622

@matofeder

Description

@matofeder

What steps did you take and what happened:

We experienced issues with the OpenStack infrastructure that led to frequent live migrations and caused several VMs to enter the ERROR state.

Once the OpenStack infrastructure was restored to a healthy state, the affected VMs returned to the ACTIVE state, and the corresponding Kubernetes nodes became Ready again.
However, the associated OpenStackServer resources remained in the ERROR state and did not recover.

Here is the event log from one affected VM:

openstack server event list test-md-0-vgh6x-52bdl-s4n9v --long  
+------------------------------------------+--------------------------------------+----------------+----------------------------+---------+----------------------------------+----------------------------------+
| Request ID                               | Server ID                            | Action         | Start Time                 | Message | Project ID                       | User ID                          |
+------------------------------------------+--------------------------------------+----------------+----------------------------+---------+----------------------------------+----------------------------------+
| req-242cf8b4-c47e-462e-9e7a-2f08a80f7f2d | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | live-migration | 2025-07-15T10:29:43.000000 | None    | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |
| req-87634299-cc38-48cd-8796-2aefa4c18fa5 | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | live-migration | 2025-07-15T10:18:19.000000 | None    | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |
| req-169f86ce-f115-4584-93a7-9edacd9a2555 | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | live-migration | 2025-07-12T14:38:00.000000 | None    | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |
| req-3aca2c17-11a0-4309-852c-acd1735413b8 | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | reboot         | 2025-07-12T14:37:50.000000 | None    | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |
| req-8d6ebe23-368b-4fc0-b0a4-c73c6ecebd2b | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | live-migration | 2025-07-12T14:00:50.000000 | Error   | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |
| req-03c7d1e6-9e88-4f2d-9410-d944b230fb6f | b4b4f1e1-3583-4adc-88c6-b7b523cd9478 | live-migration | 2025-07-12T12:52:58.000000 | Error   | 2e0ab246be0e407c88a45f4755abdd17 | 25774e95e5f34a5aa7d3fd4cff3a86ef |

The CAPO controller logs the following message and does not attempt to reconcile the resource from that state:

Not reconciling server in error state. See openStackServer.status or previously logged error for details

Why does CAPO not attempt to reconcile an OpenStackServer that is in the ERROR state, even after the underlying VM has recovered?

What did you expect to happen:

Reconciliation should have picked up the new state of the OpenStackServer

Environment:

Cluster API Provider OpenStack version: v0.12.3
Cluster-API version: v1.10.2
OpenStack version:
Minikube/KIND version: 
Kubernetes version (use kubectl version): 1.32.5
OS (e.g. from /etc/os-release): Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Inbox

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions