Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

test flake: spontaneous node reboot #1055

@pohly

Description

@pohly

A worker node spontaneously rebooted, causing container restarts and thus test failures.

Seen in https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/

https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/artifact/joblog-jenkins-pmem-csi-PR-1054-4-test-1.19.log:

Dec  7 00:33:17.367: INFO: Waiting up to 3m0s for all (but 0) nodes to be ready
[AfterEach] direct-production
  /mnt/workspace/pmem-csi_PR-1054/test/e2e/deploy/deploy.go:1112
�[1mSTEP�[0m: checking for test "direct-production Deployment Kata Containers [Testpattern: CSI Ephemeral-volume (ext4)] dax should support MAP_SYNC" in namespace default, test success
pmem-csi-intel-com-controller-d875b774-r6shd/[email protected]: ==== end of pod log ====
WARNING: pod log: pmem-csi-intel-com-controller-d875b774-r6shd/pmem-driver: Get "https://172.17.0.5:10250/containerLogs/default/pmem-csi-intel-com-controller-d875b774-r6shd/pmem-driver?follow=true": dial tcp 172.17.0.5:10250: connect: connection refused
...
Dec  7 00:34:37.493: INFO: Done with waiting, PMEM-CSI driver v1.0.0-48-g858d2ca0 is ready.
Dec  7 00:34:37.514: FAIL: container "pmem-driver" in pod "pmem-csi-intel-com-controller-d875b774-r6shd" restarted 1 times, last state: {Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:255,Signal:0,Reason:Unknown,Message:,StartedAt:2021-12-06 23:23:32 +0000 UTC,FinishedAt:2021-12-07 00:33:58 +0000 UTC,ContainerID:containerd://c103cca52585e83c30b0afa64ce57b8048fb90998e36bf7a96bafeafaec4ecb3,}}

https://cloudnative-k8sci.southcentralus.cloudapp.azure.com/view/pmem-csi/job/pmem-csi/view/change-requests/job/PR-1054/4/artifact/joblog-jenkins-pmem-csi-PR-1054-4-kubeletlogs-1.19.log:

Dec 07 00:30:52 pmem-csi-govm-worker1 kubelet[855]: E1207 00:30:52.749878     855 upgradeaware.go:387] Error proxying data from backend to client: tls: use of closed connection
-- Boot 2a48549ebe844612bb074c64784b43f9 --
Dec 07 00:33:59 pmem-csi-govm-worker1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Dec 07 00:34:01 pmem-csi-govm-worker1 kubelet[636]: I1207 00:34:01.312285     636 server.go:411] Version: v1.19.11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions