Skip to content

chore: update ubuntu scriptless e2e flow

bb6f742
Select commit
Loading
Failed to load commit list.
Draft

feat: skip cloud-init ready report and add standalone report_ready script #8056

chore: update ubuntu scriptless e2e flow
bb6f742
Select commit
Loading
Failed to load commit list.
Azure Pipelines / Agentbaker GPU E2E failed Mar 11, 2026 in 19m 13s

Build #20260311.94 had test failures

Details

Tests

  • Failed: 2 (8.33%)
  • Passed: 22 (91.67%)
  • Other: 0 (0.00%)
  • Total: 24

Annotations

Check failure on line 956 in Build log

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker GPU E2E

Build log #L956

Script failed with exit code: 1

Check failure on line 1 in Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker GPU E2E

Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG

Failed
Raw output
=== RUN   Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG
=== PAUSE Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG
=== CONT  Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG
    test_helpers.go:362: [19.298s] TAGS {Name:Test_Ubuntu2404_NvidiaDevicePluginRunning_MIG ImageName:2404gen2containerd OS:ubuntu Arch:amd64 NetworkIsolated:false NonAnonymousACR:false GPU:true WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false MockAzureChinaCloud:false VMSeriesCoverageTest:false}
    test_helpers.go:199: [19.302s] → running scenario...
    test_helpers.go:231: [19.839s] → preparing AKS node...
    vmss.go:298: [25.931s] → creating VMSS hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig...
    vmss.go:219: [28.012s] VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig/overview
    vmss.go:225: [28.012s] Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v4-e1f58/overview
    vmss.go:331: [36.558s] VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
    vmss.go:335: [36.558s] SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
        ========================
        az network bastion ssh --target-resource-id "/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig/virtualMachines/0" --name "abe2e-kubenet-v4-e1f58-bastion" --resource-group MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3 --auth-type ssh-key --username azureuser --ssh-key /tmp/private-key-3080271420
        
    bastionssh.go:304: [380.050s] Attempt 1/5 establishing SSH over bastion to 10.224.0.106
    bastionssh.go:323: [410.957s] Attempt 1/5 SSH handshake failed: ssh: handshake failed: failed to get reader: context deadline exceeded
    bastionssh.go:304: [421.012s] Attempt 2/5 establishing SSH over bastion to 10.224.0.106
    bastionssh.go:323: [451.818s] Attempt 2/5 SSH handshake failed: ssh: handshake failed: failed to get reader: context deadline exceeded
    bastionssh.go:304: [461.875s] Attempt 3/5 establishing SSH over bastion to 10.224.0.106
    vmss.go:385: [463.050s] VM reached running state
    vmss.go:355: [463.050s] ✓ creating VMSS hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig done (437.1s)
    kube.go:149: [463.050s] → waiting for node hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig to be ready...
    kube.go:181: [463.117s] node hhhk-2026-03-11-ubuntu2404nvidiadevicepluginrunningmig000000 is ready. Taints: null Conditions: [{"type":"NetworkUnavailable","status":"False","lastHeartbeatTime":"2026-03-11T04:31:53Z","lastTransitionTime":"2026-03-11T04:31:53Z","reason":"RouteCreated","message":"RouteController created a route"},{"type":"FrequentUnregisterNetDevice","status":"False","lastHeartbeatTime":"2026-03-11T04:32:00Z","lastTransitionTime":"2026-03-11T04:31:59Z","reason":"NoFrequentUnregisterNetDevice","message":"node is functioning properly"},{"type":"FrequentKubeletRestart","status":"False","lastHeartbeatTime":"2026-03-11T04:32:00Z","lastTransitionTime":"2026-03-11T04:31:59Z","reason":"NoFrequentKubeletRestart","message":"kubelet is functioning properly"},{"type":"FrequentDockerRestart","status":"False","lastHeartbeatTime":"2026-03-11T04:32:00Z","lastTransitionTime":"2026-03-11T04:31:59Z","reason":"NoFrequentDock

Check failure on line 1 in Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag

See this annotation in the file changed.

@azure-pipelines azure-pipelines / Agentbaker GPU E2E

Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag

Failed
Raw output
=== RUN   Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag
=== PAUSE Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag
=== CONT  Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag
    test_helpers.go:362: [9.137s] TAGS {Name:Test_Ubuntu2204_NvidiaDevicePluginRunning_WithoutVMSSTag ImageName:2204gen2containerd OS:ubuntu Arch:amd64 NetworkIsolated:false NonAnonymousACR:false GPU:true WASM:false BootstrapTokenFallback:false KubeletCustomConfig:false Scriptless:false VHDCaching:false MockAzureChinaCloud:false VMSeriesCoverageTest:false}
    test_helpers.go:199: [15.156s] → running scenario...
    test_helpers.go:231: [19.839s] → preparing AKS node...
    vmss.go:298: [25.953s] → creating VMSS rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou...
    vmss.go:219: [28.262s] VMSS portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou/overview
    vmss.go:225: [28.262s] Managed cluster portal link: https://ms.portal.azure.com/#@microsoft.onmicrosoft.com/resource/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.ContainerService/managedClusters/abe2e-kubenet-v4-e1f58/overview
    vmss.go:331: [37.286s] VM will be automatically deleted after the test finishes, to preserve it for debugging purposes set KEEP_VMSS=true or pause the test with a breakpoint before the test finishes or failed
    vmss.go:335: [37.286s] SSH Instructions: (may take a few minutes for the VM to be ready for SSH)
        ========================
        az network bastion ssh --target-resource-id "/subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3/providers/Microsoft.Compute/virtualMachineScaleSets/rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou/virtualMachines/0" --name "abe2e-kubenet-v4-e1f58-bastion" --resource-group MC_abe2e-westus3_abe2e-kubenet-v4-e1f58_westus3 --auth-type ssh-key --username azureuser --ssh-key /tmp/private-key-3080271420
        
    bastionssh.go:304: [289.900s] Attempt 1/5 establishing SSH over bastion to 10.224.0.113
    vmss.go:385: [292.835s] VM reached running state
    vmss.go:355: [292.835s] ✓ creating VMSS rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou done (266.9s)
    kube.go:149: [292.835s] → waiting for node rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou to be ready...
    kube.go:181: [292.866s] node rka0-2026-03-11-ubuntu2204nvidiadevicepluginrunningwithou000000 is ready. Taints: [{"key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true","effect":"NoSchedule"}] Conditions: [{"type":"GPUClockThrottling","status":"True","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransitionTime":"2026-03-11T04:28:53Z","reason":"GPUClockThrottlingIsPresent"},{"type":"NVIDIAGRIDStatusInvalid","status":"False","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransitionTime":"2026-03-11T04:28:53Z","reason":"NVIDIAGRIDStatusValid","message":"NVIDIA Grid Status Valid"},{"type":"ContainerRuntimeProblem","status":"False","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransitionTime":"2026-03-11T04:29:23Z","reason":"ContainerRuntimeIsUp","message":"container runtime service is up"},{"type":"FrequentKubeletRestart","status":"False","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransitionTime":"2026-03-11T04:28:53Z","reason":"NoFrequentKubeletRestart","message":"kubelet is functioning properly"},{"type":"KernelDeadlock","status":"False","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransitionTime":"2026-03-11T04:28:53Z","reason":"KernelHasNoDeadlock","message":"kernel has no deadlock"},{"type":"FrequentDockerRestart","status":"False","lastHeartbeatTime":"2026-03-11T04:29:24Z","lastTransiti