Skip to content

Commit e8fdef1

Browse files
committed
Address more details about volume expansion
1 parent a7bd0e5 commit e8fdef1

File tree

1 file changed

+22
-15
lines changed
  • keps/sig-storage/284-enable-volume-expansion

1 file changed

+22
-15
lines changed

keps/sig-storage/284-enable-volume-expansion/README.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -388,8 +388,12 @@ different feature gates that control various aspects of expansion.
388388
- Describe the mechanism:
389389
- Will enabling / disabling the feature require downtime of the control
390390
plane?
391+
Enabling/Disabling this feature does not require complete downtime of control-plane
392+
and feature gates can be enabled progressively on different control-plane nodes.
391393
- Will enabling / disabling the feature require downtime or reprovisioning
392394
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
395+
Enabling this feature can be enabled progressively on nodes and as expansion is enabled
396+
on the node then volume expansion will happen on kubelet.
393397

394398
###### Does enabling the feature change any default behavior?
395399

@@ -457,21 +461,13 @@ Having said that if file system requires expansion during mount then it is obvio
457461

458462
- [ ] Metrics
459463
- controller expansion operation duration:
460-
- Metric name: storage_operation_duration_seconds{operation_name=expand_volume}
464+
- Metric name: storage_operation_duration_seconds{operation_name=expand_volume, status=success|fail-unknown}
461465
- [Optional] Aggregation method: percentile
462466
- Components exposing the metric: kube-controller-manager
463-
- controller expansion operation errors:
464-
- Metric name: storage_operation_errors_total{operation_name=expand_volume}
465-
- [Optional] Aggregation method: cumulative counter
466-
- Components exposing the metric: kube-controller-manager
467467
- node expansion operation duration:
468-
- Metric name: storage_operation_duration_seconds{operation_name=volume_fs_resize}
468+
- Metric name: storage_operation_duration_seconds{operation_name=volume_fs_resize, status=success|fail-unknown}
469469
- [Optional] Aggregation method: percentile
470470
- Components exposing the metric: kubelet
471-
- node expansion operation errors:
472-
- Metric name: storage_operation_errors_total{operation_name=volume_fs_resize}
473-
- [Optional] Aggregation method: cumulative counter
474-
- Components exposing the metric: kubelet
475471
- CSI operation metrics:
476472
- Metric name: csi_sidecar_operations_seconds
477473
- [Optional] Aggregation method: percentile
@@ -481,6 +477,8 @@ Having said that if file system requires expansion during mount then it is obvio
481477
- Details:
482478

483479
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
480+
We are going to add equivalent of intree storage_operation metrics for volume expansion when
481+
expansion is performed externally via external-resizer.
484482

485483
### Dependencies
486484

@@ -504,14 +502,18 @@ Yes enabling this feature requires new API calls.
504502
- GET PV
505503
- List PVs
506504
- originating components: kubelet, kube-controller-manager, external-resizer
507-
- resync duration: 10mins
505+
- resync duration: 10mins (also user configurable)
508506
- Update to PVCs:
509507
- API operations
510508
- PATCH PVC
511509
- GET PVC
512510
- List PVC
513511
- originating components: kubelet, kube-controller-manager, external-resizer
514-
- resync duration: 10mins
512+
- resync duration: 10mins (also user configurable)
513+
514+
If user enables protection for not expanding PVCs that are in-use, external-resizer will
515+
also watch *all* pods in the cluster. This is an optional flag in external-resizer and generally
516+
only needed when some CSI drivers don't want to handle expansion calls for volumes which are potentially in-use by a pod.
515517

516518
###### Will enabling / using this feature result in introducing new API types?
517519

@@ -525,9 +527,11 @@ Yes, we expect new calls to modify existing volume objects.
525527

526528
Describe them, providing:
527529
- API type(s): PVC
528-
- Estimated increase in size: A PVC with conditions could have its size increased by anywhere between 100 to 250B.
529-
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
530-
530+
- Estimated increase in size: A PVC with conditions could have its size increased by anywhere between 100 to 250B.
531+
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
532+
- API type(s): StorageClass
533+
- Estimated increase in size: A StorageClass with `AllowVolumeExpansion` has its size increased by 26bytes almost.
534+
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
531535

532536
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
533537

@@ -540,6 +544,9 @@ Enabling this feature should not result in resource usage by significant margin,
540544
### Troubleshooting
541545

542546
###### How does this feature react if the API server and/or etcd is unavailable?
547+
Since this feature is user driven and API server or etcd becomes unavailable then users won't be able to expand the PVC.
548+
But if API server becomes unavailable midway through the expansion process then the expansion controller may not be able
549+
save updated PVC in api-server but control-flow is designed to retry and recover from such failures.
543550

544551
###### What are other known failure modes?
545552

0 commit comments

Comments
 (0)