Skip to content

Commit 8dbf4ba

Browse files
author
Andi Li
committed
Add a third phase in support of rollbacks in phase 2.
1 parent a4984a6 commit 8dbf4ba

File tree

1 file changed

+16
-4
lines changed

1 file changed

+16
-4
lines changed

keps/sig-storage/177-volume-snapshot/tighten-validation-webhook-crd.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -232,10 +232,11 @@ CRD validation is preferred over webhook validation due to their lower complexit
232232

233233
Tighten the validation on Volume Snapshot objects. Please see the tables below for detailed information.
234234

235-
Due to backwards compatibility concerns, the tightening will occur in two phases.
235+
Due to backwards compatibility concerns, the tightening will occur in three phases.
236236

237237
1. The first phase is webhook-only, and will use [ratcheting validation](#backwards-compatibility). It will be the user's responsibility to clean up invalid objects which already existed before the webhook was enabled. Invalid objects are those which fail the new, stricter validation. The controller will not be able to automatically fix invalid objects, however it will apply a [label](#automatic-labelling-of-invalid-objects) to invalid objects so that users can easily locate them.
238-
2. The second phase can occur once all invalid objects are cleared from the cluster. It will be the cluster admin's responsibility to check and detect when it is safe to move to the second phase. The CRD schema validation will be tightened and the webhook will stick around to enforce immutability until immutable fields come to CRDs (Custom Resource Definition). This will be accompanied by a version change to make it clear the CRD is using different validation.
238+
2. The second phase can occur once all invalid objects are cleared from the cluster. It will be the cluster admin's responsibility to check and detect when it is safe to move to the second phase. The CRD schema validation will be tightened and the webhook will stick around to enforce immutability until immutable fields come to CRDs (Custom Resource Definition). This will be accompanied by a version change to make it clear the CRD is using different validation, however the storage version will be kept as `v1beta1` to ensure a [rollback](#rollback) is possible at phase 2.
239+
3. The storage version of the CRD will be changed from `v1beta1` to `v1`
239240

240241
The phases come in separate releases to allow users / cluster admin the opportunity to clean their cluster of any invalid objects. More details are in the Risks and Mitigations section.
241242

@@ -280,7 +281,7 @@ Authentication on incoming requests to the webhook server is configurable howeve
280281

281282
Webhooks add latency to each API server call, thus setting up a reasonable timeout for each AdmissionReview request from the webhook server side is critical. The default timeout is 10 seconds if not specified. When an AdmissionReview request sent to the webhook server timed out, `failurePolicy`(default to `Fail` which is equivalent to disallow) will be triggered.
282283

283-
In the ValidatingWebhookConfiguration yaml example, a default timeout of two seconds is provided, cluster admins who wish to change the timeout may change the value of `timeoutSeconds`.
284+
In the ValidatingWebhookConfiguration yaml [example](#kubernetes-api-server-configuration), a default timeout of two seconds is provided, cluster admins who wish to change the timeout may change the value of `timeoutSeconds`.
284285

285286
To avoid migration pain it is recommended to start with a `failurePolicy` value of `Ignore`, changing it to `Fail` only after the webhook is confirmed to have been installed successfully. Choosing `Ignore` means that it would be possible invalid objects can get created/updated in the system.
286287

@@ -389,6 +390,8 @@ For `UPDATE` operations, the webhook server will receive the existing object and
389390

390391
Once we are sure no invalid data is persisted, we can switch to CRD schema-enforced validation with validating webhooks for immutability in a subsequent release.
391392

393+
#### Rollback
394+
392395
If users do not completely remove their invalid objects before upgrading their CRD definition, it should be possible to downgrade the CRD definition to allow invalid objects to get deleted.
393396

394397
The rollback procedure would look like this:
@@ -398,6 +401,15 @@ The rollback procedure would look like this:
398401
4. User upgrades the control plane again.
399402
5. In an n+2 release, once all the invalid objects are purged, we can switch the storage version to v1.
400403

404+
In phase 2, the storage version will be kept at v1beta1 in order to ensure the rollback is possible.
405+
406+
In phase 3, the storage version will be changed to v1.
407+
408+
```yaml
409+
v1 (served=true, storage=false)
410+
v1beta1 (served=false, storage=true)
411+
```
412+
401413
#### Current Controller validation of OneOf semantic
402414

403415
##### Handling VolumeSnapshot.
@@ -451,7 +463,7 @@ webhooks:
451463
service:
452464
namespace: "default"
453465
name: "snapshot-validation-service"
454-
path: "/path/to/webhook"
466+
path: "/volumesnapshots"
455467
caBundle: "LS0tLS...base64 encoded of public key...LS0K"
456468
admissionReviewVersions: ["v1", "v1beta1"]
457469
sideEffects: None

0 commit comments

Comments
 (0)