-
DescriptionAfter the automatic cluster update, there seems to be an issue with mounting volumes again. Seemingly random, some pods can't mount their volumes. In the Hetzner Cloud Console I can see that the respective volumes are attached to the nodes. kubectl -n my-namespace get events
LAST SEEN TYPE REASON OBJECT MESSAGE
3m25s Warning FailedMount pod/db-0 MountVolume.SetUp failed for volume "pvc-438cfa7b-d698-46e9-8ce3-828c2915d07b" : rpc error: code = InvalidArgument desc = missing device path
23m Warning FailedMount pod/db-0 Unable to attach or mount volumes: unmounted volumes=[db-claim0], unattached volumes=[kube-api-access-dp87p db-claim0]: timed out waiting for the condition
11m Warning FailedMount pod/db-0 Unable to attach or mount volumes: unmounted volumes=[db-claim0], unattached volumes=[db-claim0 kube-api-access-dp87p]: timed out waiting for the condition So I found this issue: hetznercloud/csi-driver#278 and the solution was apparently to delete dangling kubectl get volumeattachments |grep pvc-438cfa7b-d698-46e9-8ce3-828c2915d07b outputs something like: csi-b6c4d43c15054265d620d98d8c3757c213703cf8833a3fed4deaed06e50f163e csi.hetzner.cloud pvc-438cfa7b-d698-46e9-8ce3-828c2915d07b my-cluster-agent-large-fsn1-our true 14h The first column of this output is the kubectl delete volumeattachments.storage.k8s.io csi-b6c4d43c15054265d620d98d8c3757c213703cf8833a3fed4deaed06e50f163e After that I deleted the pod that used this pvc. It got recreated and everything worked fine again. According to the aforementioned GitHub issue, root cause seems to be a bug in the Hetzner CSI driver. EDIT: Checking the CSI driver version: kubectl describe -n kube-system pod hcloud-csi-controller-0
...
hcloud-csi-driver:
Container ID: containerd://4b9f44607d50017bd841745638cba6357f0ac75c30f9ef4813a12f83fa9e5105
Image: hetznercloud/hcloud-csi-driver:1.6.0
Image ID: docker.io/hetznercloud/hcloud-csi-driver@sha256:1475d525f9a4039ae8f1d81666a0fc912d92f34415f6c53723656dff0ee16bd1
... So it seems to be 1.6.0. According to the mentioned issue, v2.1.1 should fix the issue. Kube.tf filelocals {
hcloud_token = "xxxxxxxxxxx"
ssh_port = 22
}
module "kube-hetzner" {
providers = {
hcloud = hcloud
}
hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
source = "kube-hetzner/kube-hetzner/hcloud"
ssh_public_key = file("~/.ssh/id_ed25519.pub")
ssh_private_key = file("~/.ssh/id_ed25519")
network_region = "eu-central"
control_plane_nodepools = [
{
name = "control-plane-fsn1",
server_type = "cpx21",
location = "fsn1",
labels = [],
taints = [],
count = 1
},
{
name = "control-plane-nbg1",
server_type = "cpx21",
location = "nbg1",
labels = [],
taints = [],
count = 1
},
{
name = "control-plane-hel1",
server_type = "cpx21",
location = "hel1",
labels = [],
taints = [],
count = 1
}
]
agent_nodepools = [
{
name = "agent-large-fsn1",
server_type = "cpx21",
location = "fsn1",
labels = [],
taints = [],
count = 1
},
{
name = "agent-large-nbg1",
server_type = "cpx21",
location = "nbg1",
labels = [],
taints = [],
count = 1
},
{
name = "agent-large-hel1",
server_type = "cpx21",
location = "hel1",
labels = [],
taints = [],
count = 1
},
]
enable_wireguard = true
load_balancer_type = "lb11"
load_balancer_location = "nbg1"
ingress_controller = "nginx"
kured_options = {
"reboot-days": "su"
"start-time": "3am"
"end-time": "8am"
"time-zone": "Local"
}
initial_k3s_channel = "v1.25"
cluster_name = "my-cluster"
extra_firewall_rules = [
{
description = "Allow inbound traffic to the Kube API server from our office ip"
direction = "in"
protocol = "tcp"
port = "6443"
source_ips = ["x.x.x.x/32"]
destination_ips = [] # Won't be used for this rule
},
{
description = "Allow inbound SSH traffic from our office ip"
direction = "in"
protocol = "tcp"
port = local.ssh_port
source_ips = ["x.x.x.x/32"]
destination_ips = [] # Won't be used for this rule
},
{
description = "Allow outbound SSH traffic"
direction = "out"
protocol = "tcp"
port = "22"
source_ips = [] # Won't be used for this rule
destination_ips = ["0.0.0.0/0", "::/0"]
},
]
additional_tls_sans = ["my.cluster.com"]
lb_hostname = "lb.cluster.com"
enable_rancher = true
create_kubeconfig = false
}
provider "hcloud" {
token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}
terraform {
required_version = ">= 1.3.3"
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = ">= 1.38.2"
}
}
}
output "kubeconfig" {
value = module.kube-hetzner.kubeconfig
sensitive = true
}
variable "hcloud_token" {
sensitive = true
default = ""
} ScreenshotsNo response PlatformClient: Arch Linux |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
The question arises, can I just set |
Beta Was this translation helpful? Give feedback.
-
@thobens Yes, that should work! |
Beta Was this translation helpful? Give feedback.
-
I've tried to update the CSI driver, but after running
The only change I made to kube.tf was uncommenting After that I upgraded the kube-hetzner module from 2.2.0 to 2.2.4 and tried to apply again, but still no luck. Thank you for your time and let me know if you need more info. |
Beta Was this translation helpful? Give feedback.
cat'ing the file at /var/post-install/hcloud-csi.yaml outputs
404: Not Found
. Which suggests that the path to the hcloud-csi.yaml file is not correct. Looking at the code ininit.tf
it becomes clear that the value2.1.1
should bev2.1.1
. Deploying this works fine. So thanks for your comment, it helped me to understand a bit more about the debugging process :)