Hi, Due to host capacity failure, we are facing long queue on GPU E2E Test which uses A10.2 runner. This is tracking issue for that.
Context:
We recently migrated from A10.1 to A10.2 for our GPU E2E Test, but there is limit of 20 GPUs from CNCF side and due to some issue currently it is facing out of capacity failure.
Related Slack
Kubeflow Slack - https://cloud-native.slack.com/archives/C0742LDFZ4K/p1773344737522609
#cncf-ci-infra - https://cloud-native.slack.com/archives/C08P4HUFQ6M/p1773430783406709
cc @andreyvelich @XploY04 @Goku2099
/priority p2