Skip to content

Occasional flake with OSProvisioningClientError #496

@jsturtevant

Description

@jsturtevant

We are seeing an occasional vm provisioning error:

"{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed
 with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"OSProvisioningClientError\",\"message\":\"OS 
Provisioning for VM 'capz-conf-6l2q7' did not finish in the allotted time. However, the VM guest agent was detected 
running. This suggests the guest OS has not been properly prepared to be used as a VM image (with 
CreateOption=FromImage). To resolve this issue, either use the VHD as is with CreateOption=Attach or prepare it 
properly for use as an image:\\r\\n * Instructions for Windows: https://learn.microsoft.com/azure/virtual-machines/windows/prepare-for-upload-vhd-image\\r\\n * Instructions for Linux: 
https://learn.microsoft.com/azure/virtual-machines/linux/create-upload-generic \"}]}}",

Interestingly the other node (which uses the same VHD image) came up just fine:

capz-conf-ry2zwl-control-plane-kpfmh   Ready    control-plane   3h48m   v1.33.0-beta.0.679+5c7491bf0874a8   10.0.0.4      <none>        Ubuntu 24.04.2 LTS               6.8.0-1021-azure   containerd://1.7.20
capz-conf-ss4bz                        Ready    <none>          3h45m   v1.33.0-beta.0.679+5c7491bf0874a8   10.1.0.5      <none>    

https://storage.googleapis.com/kubernetes-ci-logs/logs/ci-kubernetes-e2e-capz-master-windows/1905560033546997760/build-log.txt

We didn't get logs from the successful nodes since the logging code had a panic:

Done cleaning up after docker in docker.
Collecting logs for cluster capz-conf-ry2zwl in namespace default and dumping logs to /logs/artifacts
E0328 14:08:11.184919   42840 reflector.go:158] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:243: Failed to watch *v1.Pod: Get \"https://capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o.hcp.northeurope.azmk8s.io:443/api/v1/pods?resourceVersion=91160&timeoutSeconds=438&watch=true\": dial tcp: lookup capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o.hcp.northeurope.azmk8s.io on 172.20.0.10:53: no such host" logger="UnhandledError"

the cluster name looks suspicious: capz-conf--capz-conf-ry2zwl-46678f-2uf8mw3o

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions