|
1 |
| -<!-- |
2 |
| -**Note:** When your KEP is complete, all of these comment blocks should be removed. |
3 |
| -
|
4 |
| -To get started with this template: |
5 |
| -
|
6 |
| -- [ ] **Pick a hosting SIG.** |
7 |
| - Make sure that the problem space is something the SIG is interested in taking |
8 |
| - up. KEPs should not be checked in without a sponsoring SIG. |
9 |
| -- [ ] **Create an issue in kubernetes/enhancements** |
10 |
| - When filing an enhancement tracking issue, please ensure to complete all |
11 |
| - fields in that template. One of the fields asks for a link to the KEP. You |
12 |
| - can leave that blank until this KEP is filed, and then go back to the |
13 |
| - enhancement and add the link. |
14 |
| -- [ ] **Make a copy of this template directory.** |
15 |
| - Copy this template into the owning SIG's directory and name it |
16 |
| - `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no |
17 |
| - leading-zero padding) assigned to your enhancement above. |
18 |
| -- [ ] **Fill out as much of the kep.yaml file as you can.** |
19 |
| - At minimum, you should fill in the "title", "authors", "owning-sig", |
20 |
| - "status", and date-related fields. |
21 |
| -- [ ] **Fill out this file as best you can.** |
22 |
| - At minimum, you should fill in the "Summary", and "Motivation" sections. |
23 |
| - These should be easy if you've preflighted the idea of the KEP with the |
24 |
| - appropriate SIG(s). |
25 |
| -- [ ] **Create a PR for this KEP.** |
26 |
| - Assign it to people in the SIG that are sponsoring this process. |
27 |
| -- [ ] **Merge early and iterate.** |
28 |
| - Avoid getting hung up on specific details and instead aim to get the goals of |
29 |
| - the KEP clarified and merged quickly. The best way to do this is to just |
30 |
| - start with the high-level sections and fill out details incrementally in |
31 |
| - subsequent PRs. |
32 |
| -
|
33 |
| -Just because a KEP is merged does not mean it is complete or approved. Any KEP |
34 |
| -marked as a `provisional` is a working document and subject to change. You can |
35 |
| -denote sections that are under active debate as follows: |
36 |
| -
|
37 |
| -``` |
38 |
| -<<[UNRESOLVED optional short context or usernames ]>> |
39 |
| -Stuff that is being argued. |
40 |
| -<<[/UNRESOLVED]>> |
41 |
| -``` |
42 |
| -
|
43 |
| -When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions |
44 |
| -focused. If you disagree with what is already in a document, open a new PR |
45 |
| -with suggested changes. |
46 |
| -
|
47 |
| -One KEP corresponds to one "feature" or "enhancement", for its whole lifecycle. |
48 |
| -You do not need a new KEP to move from beta to GA, for example. If there are |
49 |
| -new details that belong in the KEP, edit the KEP. Once a feature has become |
50 |
| -"implemented", major changes should get new KEPs. |
51 |
| -
|
52 |
| -The canonical place for the latest set of instructions (and the likely source |
53 |
| -of this file) is [here](/keps/NNNN-kep-template/README.md). |
54 |
| -
|
55 |
| -**Note:** Any PRs to move a KEP to `implementable` or significant changes once |
56 |
| -it is marked `implementable` must be approved by each of the KEP approvers. |
57 |
| -If any of those approvers is no longer appropriate than changes to that list |
58 |
| -should be approved by the remaining approvers and/or the owning SIG (or |
59 |
| -SIG Architecture for cross cutting KEPs). |
60 |
| ---> |
61 | 1 | # KEP-1698: generic ephemeral inline volumes
|
62 | 2 |
|
63 | 3 | <!-- toc -->
|
@@ -103,28 +43,14 @@ SIG Architecture for cross cutting KEPs).
|
103 | 43 |
|
104 | 44 | ## Release Signoff Checklist
|
105 | 45 |
|
106 |
| -<!-- |
107 |
| -**ACTION REQUIRED:** In order to merge code into a release, there must be an |
108 |
| -issue in [kubernetes/enhancements] referencing this KEP and targeting a release |
109 |
| -milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases) |
110 |
| -of the targeted release**. |
111 |
| -
|
112 |
| -For enhancements that make changes to code or processes/procedures in core |
113 |
| -Kubernetes i.e., [kubernetes/kubernetes], we require the following Release |
114 |
| -Signoff checklist to be completed. |
115 |
| -
|
116 |
| -Check these off as they are completed for the Release Team to track. These |
117 |
| -checklist items _must_ be updated for the enhancement to be released. |
118 |
| ---> |
119 |
| - |
120 | 46 | Items marked with (R) are required *prior to targeting to a milestone / release*.
|
121 | 47 |
|
122 | 48 | - [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
|
123 |
| -- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
124 |
| -- [ ] (R) Design details are appropriately documented |
125 |
| -- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 49 | +- [X] (R) KEP approvers have approved the KEP status as `implementable` |
| 50 | +- [X] (R) Design details are appropriately documented |
| 51 | +- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
126 | 52 | - [X] (R) Graduation criteria is in place
|
127 |
| -- [ ] (R) Production readiness review completed |
| 53 | +- [X] (R) Production readiness review completed |
128 | 54 | - [ ] Production readiness review approved
|
129 | 55 | - [ ] "Implementation History" section is up-to-date for milestone
|
130 | 56 | - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
|
@@ -306,11 +232,21 @@ directly. Cluster administrators must be made aware of this. If this
|
306 | 232 | does not fit their security model, they can disable the feature
|
307 | 233 | through the feature gate that will be added for the feature.
|
308 | 234 |
|
309 |
| -In addition, with a new |
| 235 | +In addition, with a new `ephemeral` value for |
310 | 236 | [`FSType`](https://github.com/kubernetes/kubernetes/blob/1fb0dd4ec5134014e466509163152112626d52c3/pkg/apis/policy/types.go#L278-L309)
|
311 | 237 | it will be possible to limit the usage of this volume source via the
|
312 | 238 | [PodSecurityPolicy
|
313 | 239 | (PSP)](https://kubernetes.io/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems).
|
| 240 | +If a PSP exists, `FSType` either has to include `all` or `ephemeral` |
| 241 | +for this feature to be allowed. If no PSP exists, the feature is |
| 242 | +allowed. |
| 243 | + |
| 244 | +Adding that new value is an API change for PSP because it changes |
| 245 | +validation. When the feature is disabled, validation must tolerate |
| 246 | +this new value in updates of existing PSP objects that already contain |
| 247 | +the value, but must not allow it when creating a new PSP or updating a |
| 248 | +PSP that does not already contain the value. When the feature is |
| 249 | +enabled, validation must allow this value on any create or update. |
314 | 250 |
|
315 | 251 | The normal namespace quota for PVCs in a namespace still applies, so
|
316 | 252 | even if users are allowed to use this new mechanism, they cannot use
|
@@ -445,10 +381,12 @@ automatically enable late binding for PVCs which are owned by a pod.
|
445 | 381 | - Gather feedback from developers and surveys
|
446 | 382 | - Errors emitted as pod events
|
447 | 383 | - Decide whether `CSIVolumeSource` (in beta at the moment) should be
|
448 |
| - merged with `EphemeralVolumeSource` |
| 384 | + merged with `EphemeralVolumeSource`: no, instead the goal is |
| 385 | + to [rename `CSIVolumeSource`](https://github.com/kubernetes/enhancements/issues/596#issuecomment-726185967) |
449 | 386 | - Decide whether in-tree ephemeral volume sources, like EmptyDir (GA
|
450 | 387 | already), should also be added EphemeralVolumeSource for sake of API
|
451 |
| - consistency |
| 388 | + consistency: [no](https://docs.google.com/document/d/1yAe3SPPosgC_QgmnY7oJTmZYWrqLrii1oA4de67DEcw/edit), |
| 389 | + this just causes API churn without tangible benefits |
452 | 390 | - Tests are in Testgrid and linked in KEP
|
453 | 391 |
|
454 | 392 | #### Beta -> GA Graduation
|
@@ -497,77 +435,173 @@ version will prevent pods from starting.
|
497 | 435 | Pods that got stuck will work again.
|
498 | 436 |
|
499 | 437 | * **Are there any tests for feature enablement/disablement?**
|
500 |
| - Yes, unit tests for the apiserver and kubelet. |
501 | 438 |
|
502 |
| -### Rollout, Upgrade and Rollback Planning |
| 439 | + Yes, unit tests for the apiserver, kube-controller-manager and kubelet cover scenarios |
| 440 | + where the feature is disabled or enabled. Tests for transitions |
| 441 | + between these states will be added before beta. |
503 | 442 |
|
504 |
| -Will be added before the transition to beta. |
| 443 | +### Rollout, Upgrade and Rollback Planning |
505 | 444 |
|
506 | 445 | * **How can a rollout fail? Can it impact already running workloads?**
|
507 | 446 |
|
| 447 | +A rollout could fail because the implementation turns out to be |
| 448 | +faulty. Such bugs may cause unexpected shutdowns of kube-scheduler, |
| 449 | +kube-apiserver, kube-controller-manager and kubelet. For the API |
| 450 | +server, broken support for the new volume type may also show up as 5xx |
| 451 | +error codes for any object that embeds a `VolumeSource` (Pod, |
| 452 | +StatefulSet, DaemonSet, etc.). |
| 453 | + |
| 454 | +Already running workloads should not be affected unless they depend on |
| 455 | +these components at runtime and bugs cause unexpected shutdowns. |
| 456 | + |
508 | 457 | * **What specific metrics should inform a rollback?**
|
509 | 458 |
|
| 459 | +One indicator are unexpected restarts of the cluster control plane |
| 460 | +components. Another are an increase in the number of pods that fail to |
| 461 | +start. In both cases further analysis of logs and pod events is needed |
| 462 | +to determine whether errors are related to this feature. |
| 463 | + |
510 | 464 | * **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
|
511 | 465 |
|
| 466 | +Not yet, but will be done manually before transition to beta. |
| 467 | + |
512 | 468 | * **Is the rollout accompanied by any deprecations and/or removals of features,
|
513 | 469 | APIs, fields of API types, flags, etc.?**
|
514 | 470 |
|
515 |
| -### Monitoring requirements |
| 471 | +No. |
516 | 472 |
|
517 |
| -Will be added before the transition to beta. |
| 473 | +### Monitoring requirements |
518 | 474 |
|
519 | 475 | * **How can an operator determine if the feature is in use by workloads?**
|
520 | 476 |
|
| 477 | +There will be pods which have a non-nil |
| 478 | +`VolumeSource.Ephemeral.VolumeClaimTemplate`. |
| 479 | + |
| 480 | + |
521 | 481 | * **What are the SLIs (Service Level Indicators) an operator can use to
|
522 | 482 | determine the health of the service?**
|
523 | 483 |
|
| 484 | +The service here is the Kubernetes control plane. Overall health and |
| 485 | +performance can be observed by measuring the the pod creation rate for |
| 486 | +pods using generic ephemeral inline volumes. Such [a |
| 487 | +SLI](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md) |
| 488 | +is defined for pods without volumes and work in progress for pods with |
| 489 | +volumes. |
| 490 | + |
| 491 | +For kube-controller-manager, a metric that exposes the usual work |
| 492 | +queue metrics data (like queue length) will be made available. |
| 493 | +Furthermore, a count of PVC creation attempts will be added, labeled |
| 494 | +with the result (successful vs. error code). A non-zero count of attempts |
| 495 | +with "already exists" will indicate that there were conflicts with |
| 496 | +manually created PVCs. |
| 497 | + |
| 498 | +TODO: list metrics names here and in kep.yaml |
| 499 | + |
524 | 500 | * **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
|
525 | 501 |
|
| 502 | +The goal is to achieve the same pod creation rate for pods using |
| 503 | +generic ephemeral inline volumes as for pods that use PVCs which get |
| 504 | +created separately. To make this comparable, the storage class should |
| 505 | +use late binding. |
| 506 | + |
| 507 | +This will need further discussion before going to GA. |
| 508 | + |
526 | 509 | * **Are there any missing metrics that would be useful to have to improve
|
527 |
| - observability if this feature?** |
| 510 | + observability of this feature?** |
528 | 511 |
|
529 |
| -### Dependencies |
| 512 | +No. |
530 | 513 |
|
531 |
| -Will be added before the transition to beta. |
| 514 | +### Dependencies |
532 | 515 |
|
533 | 516 | * **Does this feature depend on any specific services running in the cluster?**
|
534 | 517 |
|
535 |
| -### Scalability |
| 518 | +A dynamic provisioner from some kind of storage system is needed: |
536 | 519 |
|
537 |
| -Will be added before the transition to beta. |
| 520 | + * Volume provisioner |
| 521 | + * Usage description: |
| 522 | + * Impact of its outage on the feature: pods that use generic inline volumes |
| 523 | + provided by the storage system will not be able to start |
| 524 | + * Impact of its degraded performance or high-error rates on the |
| 525 | + feature: slower pod startup |
| 526 | + |
| 527 | +### Scalability |
538 | 528 |
|
539 | 529 | * **Will enabling / using this feature result in any new API calls?**
|
540 | 530 |
|
| 531 | +Enabling will not change anything. |
| 532 | + |
| 533 | +Using the feature in a pod will lead to one PVC creation per inline |
| 534 | +volume, followed by garbage collection of those PVCs when the pod |
| 535 | +terminates. |
| 536 | + |
541 | 537 | * **Will enabling / using this feature result in introducing new API types?**
|
542 | 538 |
|
| 539 | +No. |
| 540 | + |
543 | 541 | * **Will enabling / using this feature result in any new calls to cloud
|
544 | 542 | provider?**
|
545 | 543 |
|
| 544 | +Enabling the feature doesn't. Using it will cause new calls to cloud |
| 545 | +providers, but the amount is exactly the same as without this feature: |
| 546 | +for each per-pod volume, a PVC has to be created (either manually or |
| 547 | +using this feature) and a volume needs to be provisioned in a storage |
| 548 | +backend. When a pod terminates, that volume needs to be deleted again. |
| 549 | + |
546 | 550 | * **Will enabling / using this feature result in increasing size or count
|
547 | 551 | of the existing API objects?**
|
548 | 552 |
|
| 553 | +Enabling it will not change existing objects. Using it in a pod spec |
| 554 | +will increase the size by one `PersistentVolumeClaimTemplate` per |
| 555 | +inline volume and cause one PVC to be created for each inline volume. |
| 556 | + |
549 | 557 | * **Will enabling / using this feature result in increasing time taken by any
|
550 | 558 | operations covered by [existing SLIs/SLOs][]?**
|
551 | 559 |
|
| 560 | +There is a SLI for [scheduling of pods without |
| 561 | +volumes](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md) |
| 562 | +with a corresponding SLO. Those are not expected to be affected. |
| 563 | + |
| 564 | +A SLI for scheduling of pods with volumes is work in progress. The SLO |
| 565 | +for it will depend on the specific storage driver. |
| 566 | + |
552 | 567 | * **Will enabling / using this feature result in non-negligible increase of
|
553 | 568 | resource usage (CPU, RAM, disk, IO, ...) in any components?**
|
554 | 569 |
|
555 |
| -### Troubleshooting |
| 570 | +Potentially in kube-scheduler and kube-controller-manager, but mostly only if |
| 571 | +the feature is actually used. Merely enabling it will cause the new controller |
| 572 | +in kube-controller-manager to check new pods for the new volume type, which |
| 573 | +should be fast. In kube-scheduler the feature adds an additional case to |
| 574 | +switch statements that check for persistent volume sources. |
556 | 575 |
|
557 |
| -Will be added before the transition to beta. |
| 576 | +### Troubleshooting |
558 | 577 |
|
559 | 578 | * **How does this feature react if the API server and/or etcd is unavailable?**
|
560 | 579 |
|
| 580 | +Pods will not start and volumes for them will not get provisioned. |
| 581 | + |
561 | 582 | * **What are other known failure modes?**
|
562 | 583 |
|
| 584 | +As [explained |
| 585 | +above](#preventing-accidental-collision-with-existing-pvcs), the PVC |
| 586 | +that needs to be created for a pod may conflict with an already |
| 587 | +existing PVC that was created independently of the pod. In such a |
| 588 | +case, the pod will not be able to start until that independent PVC is |
| 589 | +deleted. This scenario will be exposed as events for the pod by |
| 590 | +kube-controller-manager. |
| 591 | + |
| 592 | +If the storage system fails to provision volumes, then this will be |
| 593 | +exposed as events for the PVC and (depending on the storage system) |
| 594 | +may also show up in metrics data. |
| 595 | + |
563 | 596 | * **What steps should be taken if SLOs are not being met to determine the problem?**
|
564 | 597 |
|
565 |
| -[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md |
566 |
| -[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos |
| 598 | +SLOs only exist for pods which don't use the new feature. If those are |
| 599 | +somehow affected, then error messages in the kube-scheduler and kube-controller-manager |
| 600 | +output may provide additional information. |
567 | 601 |
|
568 | 602 | ## Implementation History
|
569 | 603 |
|
570 |
| -- Kubernetes 1.19: alpha (tentative) |
| 604 | +- Kubernetes 1.19: alpha |
571 | 605 |
|
572 | 606 | ## Drawbacks
|
573 | 607 |
|
|
0 commit comments