Describe the bug
We are running in GCP and we had a workload spawn a 10GB PVC. The underlying storage controller was having issues provisioning (due to GCP ratelimits) which lasted for ~30m. During that time the volume-autoscaler noticed the disk and treated the disk as 0 size; from our slack:
@channel ERROR: <project> FAILED requesting to scale up <volume> by 20% from 0 to 2G, it was using more than 70% disk or inode space over the last 1380 seconds
From looking at the code it seems that this is due to here -- where all exceptions consider a volume to be of 0 size.
To Reproduce
Steps to reproduce the behavior
- create PVC where underlying disk won't be provisioned due to failure
- wait for volumeautoscaler to kick in
- profit
Expected behavior
In the event that the underlying disk doesn't exist it seems more appropriate for volume autoscaler to skip that pvc; If there is no underlying disk we can't really resize it. So it seems that the correct behavior here would be to change this to skip the pvc instead of assuming the size is 0.
Screenshots
n/a
Extra Information Requested
- Kubernetes Version: v1.33.5-gke.1162000
- Prometheus Version: GCP managed
Describe the bug
We are running in GCP and we had a workload spawn a 10GB PVC. The underlying storage controller was having issues provisioning (due to GCP ratelimits) which lasted for ~30m. During that time the volume-autoscaler noticed the disk and treated the disk as 0 size; from our slack:
From looking at the code it seems that this is due to here -- where all exceptions consider a volume to be of 0 size.
To Reproduce
Steps to reproduce the behavior
Expected behavior
In the event that the underlying disk doesn't exist it seems more appropriate for volume autoscaler to skip that pvc; If there is no underlying disk we can't really resize it. So it seems that the correct behavior here would be to change this to skip the pvc instead of assuming the size is 0.
Screenshots
n/a
Extra Information Requested