Skip to content

Commit 79c4222

Browse files
committed
Update the KEP boilerplate
Signed-off-by: Marko Mudrinić <[email protected]>
1 parent 854b993 commit 79c4222

File tree

2 files changed

+293
-37
lines changed

2 files changed

+293
-37
lines changed

keps/sig-release/1731-publishing-packages/README.md

Lines changed: 275 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
# Publishing kubernetes packages <!-- omit in toc -->
1+
# KEP-1731: Publishing Kubernetes packages on community infrastructure <!-- omit in toc -->
22

33
<!-- toc -->
4-
- [Release Signoff Checklist](#release-signoff-checklist)
54
- [Summary](#summary)
65
- [Motivation](#motivation)
76
- [Goals](#goals)
87
- [Non-Goals](#non-goals)
98
- [Proposal](#proposal)
109
- [User Stories](#user-stories)
1110
- [User Roles](#user-roles)
12-
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
11+
- [Risks and Mitigations](#risks-and-mitigations)
12+
- [Design Details](#design-details)
1313
- [Using OBS instead of manually building and hosting packages](#using-obs-instead-of-manually-building-and-hosting-packages)
1414
- [How Open Build Service works?](#how-open-build-service-works)
1515
- [Packages, Operating Systems, and Architectures in Scope](#packages-operating-systems-and-architectures-in-scope)
@@ -21,30 +21,51 @@
2121
- [Integrating OBS with our current release pipeline](#integrating-obs-with-our-current-release-pipeline)
2222
- [Authentication to OBS and User Management](#authentication-to-obs-and-user-management)
2323
- [How are packages used?](#how-are-packages-used)
24-
- [Risks and Mitigations](#risks-and-mitigations)
25-
- [Design Details](#design-details)
2624
- [Test Plan](#test-plan)
2725
- [Graduation Criteria](#graduation-criteria)
2826
- [Alpha](#alpha)
2927
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
3028
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
3129
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
3230
- [Version Skew Strategy](#version-skew-strategy)
31+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
32+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
33+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
34+
- [Monitoring Requirements](#monitoring-requirements)
35+
- [Dependencies](#dependencies)
36+
- [Scalability](#scalability)
37+
- [Troubleshooting](#troubleshooting)
3338
- [Implementation History](#implementation-history)
34-
- [Drawbacks [optional]](#drawbacks-optional)
35-
- [Alternatives [optional]](#alternatives-optional)
39+
- [Drawbacks](#drawbacks)
40+
- [Alternatives](#alternatives)
41+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
3642
<!-- /toc -->
3743

38-
## Release Signoff Checklist
39-
40-
- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
41-
- [ ] KEP approvers have set the KEP status to `implementable`
42-
- [ ] Design details are appropriately documented
43-
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
44-
- [ ] Graduation criteria is in place
44+
Items marked with (R) are required *prior to targeting to a milestone / release*.
45+
46+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
47+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
48+
- [ ] (R) Design details are appropriately documented
49+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
50+
- [ ] e2e Tests for all Beta API Operations (endpoints)
51+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
52+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
53+
- [ ] (R) Graduation criteria is in place
54+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
55+
- [ ] (R) Production readiness review completed
56+
- [ ] (R) Production readiness review approved
4557
- [ ] "Implementation History" section is up-to-date for milestone
4658
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
47-
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
59+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
60+
61+
<!--
62+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
63+
-->
64+
65+
[kubernetes.io]: https://kubernetes.io/
66+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
67+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
68+
[kubernetes/website]: https://git.k8s.io/website
4869

4970
## Summary
5071

@@ -162,7 +183,14 @@ Scenario: [...]
162183
```
163184
-->
164185

165-
### Implementation Details/Notes/Constraints
186+
### Risks and Mitigations
187+
188+
- _Risk_: The OBS installation provided by openSUSE is unable to serve the load generated by the Kubernetes project
189+
_Mitigation_: We can host our own mirrors and take some load from openSUSE (e.g. on Equinix Metal)
190+
- _Risk_: Building all the packages for all the distributions and their version takes too long to be done nightly or via cutting the release
191+
_Mitigation_: We do not deliver nightly packages or wait for packages to be published in the release pipeline.
192+
193+
## Design Details
166194

167195
Packages will be built and published using [Open Build Service (OBS)][obs]. openSUSE will sponsor the Kubernetes
168196
project by giving us access to the [OBS instance hosted by openSUSE][obs-build].
@@ -446,17 +474,12 @@ are other manual migration steps needed (e.g. changing the GPG key), we don't co
446474

447475
Different architectures will be published into the same repos, it is up to the package managers to pull and install the correct package for the target platform.
448476

449-
### Risks and Mitigations
450-
451-
- _Risk_: The OBS installation provided by openSUSE is unable to serve the load generated by the Kubernetes project
452-
_Mitigation_: We can host our own mirrors and take some load from openSUSE (e.g. on Equinix Metal)
453-
- _Risk_: Building all the packages for all the distributions and their version takes too long to be done nightly or via cutting the release
454-
_Mitigation_: We do not deliver nightly packages or wait for packages to be published in the release pipeline.
455-
456-
## Design Details
457-
458477
### Test Plan
459478

479+
[x] We understand the owners of the involved components may require updates to
480+
existing tests to make this code solid enough prior to committing the changes necessary
481+
to implement this enhancement.
482+
460483
There should be post-publish tests, which can be run as part or after the release process
461484

462485
- pull packages from the official mirrors
@@ -523,23 +546,245 @@ N/A
523546

524547
N/A
525548

549+
## Production Readiness Review Questionnaire
550+
551+
### Feature Enablement and Rollback
552+
553+
It's up to the user what package repository (OBS or Google) they want to use.
554+
In case OBS doesn't work for them, they can reconfigure their systems to use
555+
the Google package repository.
556+
557+
###### How can this feature be enabled / disabled in a live cluster?
558+
559+
N/A. This is configured on the operating system (i.e. package manager) level.
560+
561+
###### Does enabling the feature change any default behavior?
562+
563+
Not anticipated. We're trying to match the existing spec files as best as we
564+
can.
565+
566+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
567+
568+
Yes. Users can rollback to the Google package repository.
569+
570+
###### What happens if we reenable the feature if it was previously rolled back?
571+
572+
There are no side effects anticipated.
573+
574+
###### Are there any tests for feature enablement/disablement?
575+
576+
N/A
577+
578+
### Rollout, Upgrade and Rollback Planning
579+
580+
<!--
581+
This section must be completed when targeting beta to a release.
582+
-->
583+
584+
###### How can a rollout or rollback fail? Can it impact already running workloads?
585+
586+
N/A
587+
588+
###### What specific metrics should inform a rollback?
589+
590+
Installation and upgrading issues. For example, if a package upgrade is not
591+
possible due to some error.
592+
593+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
594+
595+
<!--
596+
Describe manual testing that was done and the outcomes.
597+
Longer term, we may want to require automated upgrade/rollback tests, but we
598+
are missing a bunch of machinery and tooling and can't do that now.
599+
-->
600+
601+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
602+
603+
No.
604+
605+
### Monitoring Requirements
606+
607+
<!--
608+
This section must be completed when targeting beta to a release.
609+
610+
For GA, this section is required: approvers should be able to confirm the
611+
previous answers based on experience in the field.
612+
-->
613+
614+
###### How can an operator determine if the feature is in use by workloads?
615+
616+
We'll ask openSUSE to provide us with metrics on the repository usage. We don't
617+
have any metrics for the Google repository and there's no way that we can
618+
get those metrics.
619+
620+
###### How can someone using this feature know that it is working for their instance?
621+
622+
Kubernetes is installed successfully and the Node is coming up and is "Ready".
623+
624+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
625+
626+
<!--
627+
This is your opportunity to define what "normal" quality of service looks like
628+
for a feature.
629+
630+
It's impossible to provide comprehensive guidance, but at the very
631+
high level (needs more precise definitions) those may be things like:
632+
- per-day percentage of API calls finishing with 5XX errors <= 1%
633+
- 99% percentile over day of absolute value from (job creation time minus expected
634+
job creation time) for cron job <= 10%
635+
- 99.9% of /health requests per day finish with 200 code
636+
637+
These goals will help you determine what you need to measure (SLIs) in the next
638+
question.
639+
-->
640+
641+
TBD
642+
643+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
644+
645+
<!--
646+
Pick one more of these and delete the rest.
647+
-->
648+
649+
- [ ] Metrics
650+
- Metric name:
651+
- [Optional] Aggregation method:
652+
- Components exposing the metric:
653+
- [ ] Other (treat as last resort)
654+
- Details:
655+
656+
TBD
657+
658+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
659+
660+
<!--
661+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
662+
implementation difficulties, etc.).
663+
-->
664+
665+
TBD
666+
667+
### Dependencies
668+
669+
<!--
670+
This section must be completed when targeting beta to a release.
671+
-->
672+
673+
###### Does this feature depend on any specific services running in the cluster?
674+
675+
N/A -- this is not a core Kubernetes feature.
676+
677+
### Scalability
678+
679+
###### Will enabling / using this feature result in any new API calls?
680+
681+
No -- this is not a core Kubernetes feature.
682+
683+
###### Will enabling / using this feature result in introducing new API types?
684+
685+
No -- this is not a core Kubernetes feature.
686+
687+
###### Will enabling / using this feature result in any new calls to the cloud provider?
688+
689+
No -- this is not a core Kubernetes feature.
690+
691+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
692+
693+
No -- this is not a core Kubernetes feature.
694+
695+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
696+
697+
No -- this is not a core Kubernetes feature.
698+
699+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
700+
701+
No.
702+
703+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
704+
705+
No.
706+
707+
### Troubleshooting
708+
709+
<!--
710+
This section must be completed when targeting beta to a release.
711+
712+
For GA, this section is required: approvers should be able to confirm the
713+
previous answers based on experience in the field.
714+
715+
The Troubleshooting section currently serves the `Playbook` role. We may consider
716+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
717+
details). For now, we leave it here.
718+
-->
719+
720+
###### How does this feature react if the API server and/or etcd is unavailable?
721+
722+
This isn't relevant -- this is not a core Kubernetes feature.
723+
724+
###### What are other known failure modes?
725+
726+
<!--
727+
For each of them, fill in the following information by copying the below template:
728+
- [Failure mode brief description]
729+
- Detection: How can it be detected via metrics? Stated another way:
730+
how can an operator troubleshoot without logging into a master or worker node?
731+
- Mitigations: What can be done to stop the bleeding, especially for already
732+
running user workloads?
733+
- Diagnostics: What are the useful log messages and their required logging
734+
levels that could help debug the issue?
735+
Not required until feature graduated to beta.
736+
- Testing: Are there any tests for failure mode? If not, describe why.
737+
-->
738+
739+
- OpenBuildService is down or in a degraded mode
740+
- Detection: relevant tests are failing, we're getting alerts from users, or
741+
the OBS team alerted us of such an issue
742+
- Mitigations: Such an issue wouldn't affect already provisioned nodes. Users
743+
wouldn't be able to provision new nodes.
744+
- Diagnostics: APT and Yum error messages.
745+
- Testing: No, we can't know in what way OBS can fail in case that happens.
746+
747+
###### What steps should be taken if SLOs are not being met to determine the problem?
748+
526749
## Implementation History
527750

528751
<!--
529-
- the `Summary` and `Motivation` sections being merged signaling SIG acceptance
530-
- the `Proposal` section being merged signaling agreement on a proposed design
752+
Major milestones in the lifecycle of a KEP should be tracked in this section.
753+
Major milestones might include:
754+
- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
755+
- the `Proposal` section being merged, signaling agreement on a proposed design
531756
- the date implementation started
532757
- the first Kubernetes release where an initial version of the KEP was available
533758
- the version of Kubernetes where the KEP graduated to general availability
534759
- when the KEP was retired or superseded
535760
-->
536761

537-
TBA
762+
N/A
763+
764+
## Drawbacks
765+
766+
<!--
767+
Why should this KEP _not_ be implemented?
768+
-->
769+
770+
N/A
771+
772+
## Alternatives
538773

539-
## Drawbacks [optional]
774+
<!--
775+
What other approaches did you consider, and why did you rule them out? These do
776+
not need to be as detailed as the proposal, but should include enough
777+
information to express the idea and why it was not acceptable.
778+
-->
540779

541780
N/A
542781

543-
## Alternatives [optional]
782+
## Infrastructure Needed (Optional)
783+
784+
<!--
785+
Use this section if you need things from the project/SIG. Examples include a
786+
new subproject, repos requested, or GitHub details. Listing these here allows a
787+
SIG to get the process for these resources started right away.
788+
-->
544789

545790
N/A

0 commit comments

Comments
 (0)