Skip to content

Commit 8c78dc7

Browse files
committed
Addressing comments
1 parent 9730f51 commit 8c78dc7

File tree

2 files changed

+24
-23
lines changed

2 files changed

+24
-23
lines changed

keps/sig-node/1797-configure-fqdn-as-hostname-for-pods/README.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -505,12 +505,12 @@ in back-to-back releases.
505505

506506
#### Alpha -> Beta Graduation
507507

508-
- Gather feedback from developers
509-
- Tests are in Testgrid and linked in KEP
508+
- Gather feedback from users
509+
- Ensure e2e tests are running in Testgrid and they are stable
510510

511511
#### Beta -> GA Graduation
512512

513-
- Allowing time for feedback
513+
- Allowing time for feedback from production users
514514

515515

516516
### Upgrade / Downgrade Strategy
@@ -527,7 +527,7 @@ enhancement:
527527
cluster required to make on upgrade in order to make use of the enhancement?
528528
-->
529529

530-
We will gate off this feature for one release (1.19), then we enable it by default as Beta in the next release (1.20), then GA in release 1.21
530+
We will gate off this feature for one release (1.19), then we enable it by default as Beta in the next release (1.20), then GA in release 1.22
531531

532532
### Version Skew Strategy
533533

@@ -579,7 +579,7 @@ _This section must be completed when targeting alpha to a release._
579579
* **How can this feature be enabled / disabled in a live cluster?**
580580
- [X] Feature gate (also fill in values in `kep.yaml`)
581581
- Feature gate name: setHostnameAsFQDN
582-
- Components depending on the feature gate: Kubelet
582+
- Components depending on the feature gate: kube-apiserver and kubelet
583583
- [ ] Other
584584
- Describe the mechanism:
585585
- Will enabling / disabling the feature require downtime of the control
@@ -601,23 +601,22 @@ _This section must be completed when targeting alpha to a release._
601601
hostname field of kernel.
602602

603603
* **Are there any tests for feature enablement/disablement?**
604-
We will have unit tests and integration tests. Not sure if we need conversion tests.
604+
No, only manual testing was performed.
605605

606606
### Rollout, Upgrade and Rollback Planning
607607

608608
_This section must be completed when targeting beta graduation to a release._
609609

610610
* **How can a rollout fail? Can it impact already running workloads?**
611-
It is not clear that the rollout can fail due to this feature. The scope of this feature
612-
is very limited and it is disabled by default.
611+
No known failure modes.
613612

614613
* **What specific metrics should inform a rollback?**
615-
Abnormal increase in `run_podsandbox_errors_total` count could be related to this feature. We should check if the feature gate is enabled and pods are using it.
614+
Abnormal increase in `run_podsandbox_errors_total` count could be related to this feature. We should filter those pods having issues to create sandbox and check whether they are stuck due to the length of their FQDN, as described in the proposal.
616615

617616
* **Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
618-
We tested introducing and removing this feature. Running pods are not affected by
619-
either introducing nor removing the feature. When disabling the feature, Pods
620-
using this feature that are "stuck" due to having long FQDNs will go into
617+
We tested enabling and disabling this feature. Running pods are not affected by
618+
either enabling nor disabling this feature. When disabling the feature, Pods
619+
using it that are "stuck" due to having long FQDNs will go into
621620
running.
622621

623622
* **Is the rollout accompanied by any deprecations and/or removals of features,
@@ -634,15 +633,16 @@ _This section must be completed when targeting beta graduation to a release._
634633
checking if there are objects with field X set) may be last resort. Avoid
635634
logs or events for this purpose.
636635

637-
For a pod to be using this feature it needs to define the PodSpec fields
638-
`subDomain` and `setHostnameAsFQDN`.
636+
Listing pods in the cluster and checking if any has both
637+
`subDomain` and `setHostnameAsFQDN` fields set.
639638

640639
* **What are the SLIs (Service Level Indicators) an operator can use to
641640
determine the health of the service?**
642641
- [X] Metrics
643-
- Metric name: Abnormal increase in `run_podsandbox_errors_total` might be related to this feature. We should check if the feature gate is enabled and pods are using it.
642+
- Metric name: `run_podsandbox_errors_total`
643+
- Comment: Abnormal increase in `run_podsandbox_errors_total` might be related to this feature. We should check if the feature gate is enabled and pods are using it.
644644
- [Optional] Aggregation method:
645-
- Components exposing the metric:
645+
- Components exposing the metric: Kubelet
646646
- [ ] Other (treat as last resort)
647647
- Details:
648648

@@ -660,7 +660,7 @@ N/A
660660
observability if this feature?**
661661
Describe the metrics themselves and the reason they weren't added (e.g. cost,
662662
implementation difficulties, etc.).
663-
N/A
663+
664664

665665
### Dependencies
666666

@@ -692,7 +692,7 @@ previous answers based on experience in the field._
692692

693693
* **Will enabling / using this feature result in increasing size or count
694694
of the existing API objects?**
695-
No
695+
Pods using this feature are required to set a new field, which increases the size of their objects by a couple of bytes.
696696

697697
* **Will enabling / using this feature result in increasing time taken by any
698698
operations covered by [existing SLIs/SLOs][]?**
@@ -723,14 +723,15 @@ _This section must be completed when targeting beta graduation to a release._
723723

724724
- Mitigations: What can be done to stop the bleeding, especially for already
725725
running user workloads?
726-
Pods should either set the PodSpec field `setHostnameAsFQDN`.
726+
Pods having problems to start should unset the PodSpec field `setHostnameAsFQDN`.
727727

728728
- Diagnostics: What are the useful log messages and their required logging
729729
levels that could help debugging the issue?
730-
This issue will be logged in Error level log messages. The error will something like`GeneratePodSandboxConfig for pod foo failed: Failed to construct FQDN from pod hostname and cluster domain, FQDN <long-fqdn> is too long (64 characters is the max, 70 characters requested)`
730+
This issue will be logged in Error level log messages and in the Events. The message will be something like `GeneratePodSandboxConfig for pod foo failed: Failed to construct FQDN from pod hostname and cluster domain, FQDN <long-fqdn> is too long (64 characters is the max, 70 characters requested)`
731731

732732
- Testing: Are there any tests for failure mode? If not describe why.
733-
Unittests cover this failure scenario. The e2e tests also check for this error, but only as protection for the CI. A test that purposely hit this issue will be flaky as the pod will remain in Pending status logging errors until the test timesout.
733+
Both unittests and e2e tests cover this failure scenario.
734+
734735

735736
* **What steps should be taken if SLOs are not being met to determine the problem?**
736737

keps/sig-node/1797-configure-fqdn-as-hostname-for-pods/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ owning-sig: sig-node
88
participating-sigs:
99
- sig-network
1010
- sig-node
11-
status: implemented
11+
status: implementable
1212
creation-date: 2020-05-18
1313
reviewers:
1414
- "@thockin"
@@ -36,7 +36,7 @@ latest-milestone: "v1.19"
3636
milestone:
3737
alpha: "v1.19"
3838
beta: "v1.20"
39-
stable: "v1.21"
39+
stable: "v1.22"
4040

4141
# The following PRR answers are required at alpha release
4242
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)