-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
ci-instabilitynon-deterministic CI / build failurenon-deterministic CI / build failure
Description
https://github.com/NVIDIA/k8s-dra-driver-gpu/actions/runs/22759033700/job/66010763180#step:3:260
# 2026-03-06T10:20:50.480Z [ 11.5s] sleep, pre-injection jitter: 1.47484 s
# 2026-03-06T10:20:51.959Z [ 13.0s] inject fault type 1: force-delete worker pod 0
# + kubectl delete pod test-failover-job-worker-0 --grace-period=0 --force
# Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
# pod "test-failover-job-worker-0" force deleted from default namespace
# + set +x
# 2026-03-06T10:25:38.629Z [ 299.6s] global deadline reached (300 seconds), collect debug data -- and leave control loop
...
# nvidia-dra-driver-gpu computedomain-daemon-c41e373a-dce3-4e9f-b86a-eb4110b0abc7-5vm97 1/1 Running 0 4m59s 192.168.35.146 gb-nvl-156-compute17 <none> <none>
# nvidia-dra-driver-gpu computedomain-daemon-c41e373a-dce3-4e9f-b86a-eb4110b0abc7-tzbqn 1/1 Running 0 4m40s 192.168.34.120 gb-nvl-156-compute18 <none> <none>
Reading this log output, I think the CD daemon log follower may be broken as of today -- probably because of a rename that we recently performed.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ci-instabilitynon-deterministic CI / build failurenon-deterministic CI / build failure
Type
Projects
Status
Backlog