@@ -77,8 +77,10 @@ SIG Architecture for cross-cutting KEPs).
77
77
- [ Changes to Snapshot Controller] ( #changes-to-snapshot-controller )
78
78
- [ Changes to external-provisioner] ( #changes-to-external-provisioner )
79
79
- [ Test Plan] ( #test-plan )
80
- - [ Unit tests] ( #unit-tests )
81
- - [ E2E tests] ( #e2e-tests )
80
+ - [ Prerequisite testing updates] ( #prerequisite-testing-updates )
81
+ - [ Unit tests] ( #unit-tests )
82
+ - [ Integration tests] ( #integration-tests )
83
+ - [ e2e tests] ( #e2e-tests )
82
84
- [ Graduation Criteria] ( #graduation-criteria )
83
85
- [ Alpha] ( #alpha )
84
86
- [ Alpha -> ; Beta] ( #alpha---beta )
@@ -214,7 +216,7 @@ need to identify the `VolumeSnapshotContent` mapped to the `VolumeSnapshot`
214
216
from which the ` PVC ` is being created.
215
217
216
218
Either through software or via manual intervention, the annotation
217
- ` snapshot.storage.kubernetes.io/allowVolumeModeChange : true ` needs to be applied
219
+ ` snapshot.storage.kubernetes.io/allow-volume-mode-change : true ` needs to be applied
218
220
to the ` VolumeSnapshotContent ` . If the backup software is a privileged user,
219
221
it will have ` Update ` and ` Patch ` permissions on ` VolumeSnapshotContents ` .
220
222
@@ -289,7 +291,7 @@ like below after this change:
289
291
kind: VolumeSnapshotContent
290
292
metadata:
291
293
annotations:
292
- - snapshot.storage .kubernetes .io /allowVolumeModeChange : " true"
294
+ - snapshot.storage .kubernetes .io /allow-volume-mode-change : " true"
293
295
...
294
296
` ` `
295
297
@@ -328,7 +330,7 @@ As part of the preprocessing steps, it will:
328
330
2. Get the ` Spec.VolumeMode ` of the ` PVC` being created.
329
331
If they do not match:
330
332
1. Get all annotations on the ` VolumeSnapshotContent` and verify if
331
- ` snapshot.storage .kubernetes .io /allowVolumeModeChange : true ` exists.
333
+ ` snapshot.storage .kubernetes .io /allow-volume-mode-change : true ` exists.
332
334
If it does not exist, block volume provisioning by returning an error.
333
335
4. In all other cases, let volume provisioning continue.
334
336
@@ -338,30 +340,70 @@ decisions.
338
340
339
341
### Test Plan
340
342
341
- E2E tests will be added for this design, that attempt to restore a volume with
342
- and without requisite privileges.
343
+ [x] I/we understand the owners of the involved components may require updates to
344
+ existing tests to make this code solid enough prior to committing the changes necessary
345
+ to implement this enhancement.
343
346
344
- #### Unit tests
347
+ ##### Prerequisite testing updates
345
348
346
- - With feature flag disabled:
347
- - attempt to convert volume mode when creating a ` PVC`
348
- from a ` VolumeSnapshot` .
349
- - With feature flag enabled, attempt to convert volume mode when creating a ` PVC`
350
- from a ` VolumeSnapshot` :
351
- - With ` Spec.SourceVolumeMode ` populated and ` snapshot.storage .kubernetes .io /allowVolumeModeChange: true `
352
- annotation present.
353
- - With ` Spec.SourceVolumeMode ` populated but no ` snapshot.storage .kubernetes .io /allowVolumeModeChange: true `
354
- annotation.
355
- - With ` Spec.SourceVolumeMode ` set to ` nil ` .
349
+ <!--
350
+ Based on reviewers feedback describe what additional tests need to be added prior
351
+ implementing this enhancement to ensure the enhancements have also solid foundations.
352
+ -->
353
+
354
+ None. New E2E tests will be added for the transition to beta.
355
+
356
+ ##### Unit tests
357
+
358
+ <!--
359
+ In principle every added code should have complete unit test coverage, so providing
360
+ the exact set of tests will not bring additional value.
361
+ However, if complete unit test coverage is not possible, explain the reason of it
362
+ together with explanation why this is acceptable.
363
+ -->
364
+
365
+ <!--
366
+ Additionally, for Alpha try to enumerate the core package you will be touching
367
+ to implement this enhancement and provide the current unit coverage for those
368
+ in the form of:
369
+ - <package>: <date> - <current test coverage>
370
+ The data can be easily read from:
371
+ https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
372
+ This can inform certain test coverage improvements that we want to do before
373
+ extending the production code to implement this enhancement.
374
+ -->
375
+
376
+ The unit tests were added to the CSI external-provisioner repo.
377
+
378
+ - https://github.com/kubernetes-csi/external-provisioner/pull/726/
379
+
380
+ ##### Integration tests
381
+
382
+ <!--
383
+ This question should be filled when targeting a release.
384
+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
385
+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
386
+ https://storage.googleapis.com/k8s-triage/index.html
387
+ -->
388
+
389
+ - No integration tests added.
390
+
391
+ ##### e2e tests
356
392
357
- #### E2E tests
393
+ <!--
394
+ This question should be filled when targeting a release.
395
+ For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
396
+ For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
397
+ https://storage.googleapis.com/k8s-triage/index.html
398
+ We expect no non-infra related flakes in the last month as a GA graduation criteria.
399
+ -->
358
400
359
401
The feature flag will be enabled for e2e tests. The tests will attempt to convert volume
360
402
mode when creating a ` PVC` from a ` VolumeSnapshot` :
361
- - With ` Spec.SourceVolumeMode ` populated and ` snapshot.storage .kubernetes .io /allowVolumeModeChange : true `
403
+ - With ` Spec.SourceVolumeMode ` populated and ` snapshot.storage .kubernetes .io /allow-volume-mode-change : true `
362
404
annotation present.
363
- - With ` Spec.SourceVolumeMode ` populated but no ` snapshot.storage .kubernetes .io /allowVolumeModeChange : true `
364
- annotation.
405
+ - With ` Spec.SourceVolumeMode ` populated but no ` snapshot.storage .kubernetes .io /allow-volume-mode-change : true `
406
+ annotation - https://github.com/kubernetes-csi/external-provisioner/pull/832: https://testgrid.k8s.io/sig-storage-csi-external-provisioner#canary
365
407
- With ` Spec.SourceVolumeMode ` set to ` nil ` .
366
408
367
409
### Graduation Criteria
@@ -468,13 +510,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
468
510
will rollout across nodes.
469
511
-->
470
512
513
+ Due to the feature gate on the external-provisioner, rolling out this feature
514
+ does not affect existing Pods that use PVCs. It also does not affect
515
+ VolumeSnapshots that are created prior to rolling out the feature, ie, the
516
+ volume mode of an existing VolumeSnapshot can be modified while creating a PVC.
517
+
471
518
###### What specific metrics should inform a rollback?
472
519
473
520
<!--
474
521
What signals should users be paying attention to when the feature is young
475
522
that might indicate a serious problem?
476
523
-->
477
524
525
+ - persistentvolumeclaim_provision_failed_total
526
+
478
527
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
479
528
480
529
<!--
@@ -483,12 +532,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
483
532
are missing a bunch of machinery and tooling and can't do that now.
484
533
-->
485
534
535
+ Yes. The feature flag was enabled and disabled separately in the csi-provisioner and snapshot-controller.
536
+
486
537
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
487
538
488
539
<!--
489
540
Even if applying deprecation policies, they may still surprise some users.
490
541
-->
491
542
543
+ No.
544
+
492
545
### Monitoring Requirements
493
546
494
547
<!--
@@ -503,6 +556,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
503
556
logs or events for this purpose.
504
557
-->
505
558
559
+ If the feature gate is enabled in the external-provisioner and snapshot-controller,
560
+ this feature will always be in use when creating a PVC from a VolumeSnapshot.
561
+
506
562
###### How can someone using this feature know that it is working for their instance?
507
563
508
564
<!--
@@ -514,13 +570,13 @@ and operation of this feature.
514
570
Recall that end users cannot usually observe component logs or access metrics.
515
571
-->
516
572
517
- - [ ] Events
518
- - Event Reason:
519
- - [ ] API .status
520
- - Condition name:
521
- - Other field:
522
- - [ ] Other (treat as last resort)
523
- - Details:
573
+ - [x ] Events
574
+ - Event Reason: ProvisioningFailed
575
+ - Event Message: Failed to provision volume with StorageClass "csi-hostpath-sc": error getting handle for DataSource Type
576
+ VolumeSnapshot by Name new-snapshot-demo: requested volume default/hpvc-restore modifies the mode of the source volume
577
+ but does not have permission to do so. snapshot.storage.kubernetes.io/allow-volume-mode-change annotation is not present
578
+ on snapshotcontent snapcontent-8d709f2e-db04-444f-aae2-e17d6c5398dd
579
+
524
580
525
581
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
526
582
@@ -539,18 +595,22 @@ These goals will help you determine what you need to measure (SLIs) in the next
539
595
question.
540
596
-->
541
597
598
+ We will add new labels to the existing persistentvolumeclaim_provision_failed_total metric
599
+ for the volume data source and status code.
600
+ The per-day percentage of calls with error status code <= 1.
601
+ However the failure will always happen as long as the feature is correctly enabled and the
602
+ annotations are not applied correctly to VolumeSnapshotContent objects.
603
+
542
604
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
543
605
544
606
<!--
545
607
Pick one more of these and delete the rest.
546
608
-->
547
609
548
- - [ ] Metrics
549
- - Metric name:
550
- - [Optional] Aggregation method:
551
- - Components exposing the metric:
552
- - [ ] Other (treat as last resort)
553
- - Details:
610
+ - [x] Metrics
611
+ - Metric name: persistentvolumeclaim_provision_failed_total
612
+ - [Optional] Aggregation method:
613
+ - Components exposing the metric: external-provisioner
554
614
555
615
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
556
616
@@ -559,6 +619,9 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
559
619
implementation difficulties, etc.).
560
620
-->
561
621
622
+ There are no metrics for persistentvolumeclaims created from volumesnapshots. This KEP aims to add those metrics to
623
+ the external-provisioner.
624
+
562
625
### Dependencies
563
626
564
627
<!--
@@ -582,12 +645,16 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
582
645
- Impact of its degraded performance or high-error rates on the feature:
583
646
-->
584
647
648
+ - [external-provisioner]
649
+ - Usage description: Failure events are emitted as events by the external-provisioner.
650
+ - Impact of its outage on the feature: Outage of this component will prevent error reporting to users.
651
+ - Impact of its degraded performance or high-error rates on the feature: Outage of this component will prevent error reporting to users.
652
+
585
653
### Scalability
586
654
587
655
###### Will enabling / using this feature result in any new API calls?
588
656
589
-
590
- This feature does not add any new API calls.
657
+ This feature adds an event write to the API server when PVC creation is blocked.
591
658
592
659
###### Will enabling / using this feature result in introducing new API types?
593
660
@@ -609,12 +676,16 @@ The latency of CSI's `CreateVolume` may increase due to this change, when the
609
676
` Spec.DataSource ` field points to a ` VolumeSnapshot` instance. This is because
610
677
there is an additional check to determine whether volume provisioning must
611
678
continue. However, this increase is expected to be minimal as there are no new
612
- API calls.
679
+ API calls and the volume spec has already been loaded into memory of the external-provisioner.
613
680
614
681
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
615
682
616
683
No.
617
684
685
+ ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
686
+
687
+ No. This feature does not introduce any resource exhaustive operations.
688
+
618
689
### Troubleshooting
619
690
620
691
<!--
@@ -627,6 +698,10 @@ details). For now, we leave it here.
627
698
628
699
###### How does this feature react if the API server and/or etcd is unavailable?
629
700
701
+ In case PVC creation is blocked due to this feature, the failure event will not be emitted
702
+ due to the unavailability of the API server. Users will need to refer to the external-provisioner
703
+ logs to determine why PVC creation is failing.
704
+
630
705
###### What are other known failure modes?
631
706
632
707
<!--
@@ -644,6 +719,9 @@ For each of them, fill in the following information by copying the below templat
644
719
645
720
###### What steps should be taken if SLOs are not being met to determine the problem?
646
721
722
+ The user needs to read the logs of the external-provisioner to determine the reason
723
+ behind why PVC creation is failing.
724
+
647
725
## Implementation History
648
726
649
727
<!--
0 commit comments