Skip to content

Commit 74a24a8

Browse files
committed
Create Reliability WG charter
1 parent c038f43 commit 74a24a8

File tree

3 files changed

+101
-0
lines changed

3 files changed

+101
-0
lines changed

sigs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2791,6 +2791,7 @@ workinggroups:
27912791
Allow users to safely use Kubernetes for managing production workloads by ensuring
27922792
Kubernetes is stable and reliable.
27932793
2794+
charter_link: charter.md
27942795
stakeholder_sigs:
27952796
- Architecture
27962797
- Cluster Lifecycle

wg-reliability/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ To understand how this file is generated, see https://git.k8s.io/community/gener
1010

1111
Allow users to safely use Kubernetes for managing production workloads by ensuring Kubernetes is stable and reliable.
1212

13+
The [charter](charter.md) defines the scope and governance of the Reliability Working Group.
14+
1315
## Stakeholder SIGs
1416
* SIG Architecture
1517
* SIG Cluster Lifecycle

wg-reliability/charter.md

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# WG Reliability Charter
2+
3+
This charter adheres to the conventions described in the [Kubernetes Charter README]
4+
and uses the Roles and Organization Management outlined in [sig-governance].
5+
6+
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
7+
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
8+
9+
## Scope
10+
11+
The Reliability Working Group (WG Reliability) is organized with the goal of
12+
allowing users to safely use Kubernetes for managing production workloads by
13+
ensuring Kubernetes is stable and reliable.
14+
15+
### In Scope
16+
17+
- What reliability means for Kubernetes and how to measure it?
18+
- Measuring Kubernetes reliability in tests
19+
- Introducing criteria for blocking the release if the reliability is
20+
below the bar
21+
- Building a list of end-user outages and reliability issues
22+
(if applicable with mitigations and/or workarounds)
23+
- Creating and prioritizing a list of areas that require reliability
24+
investments
25+
- Work with relevant SIGs on delivering necessary infrastructure
26+
(e.g. test frameworks) to unblock further steps
27+
- Initiate and drive cross-SIG reliability improvements
28+
29+
### Out of scope
30+
31+
- Designing and executing improvements clearly falling into individual SIG
32+
responsibilities.
33+
34+
## Special Powers
35+
36+
The Reliability WG has a power to block feature-oriented contributions from
37+
any SIG if requested reliability-related improvements are not being addressed.
38+
Before it can be exercised, sig-arch must approve the criteria suggested by
39+
this working group.
40+
41+
Given WGs are by-definition temporary, on WG Reliability retirement we will
42+
pass this responsibility to to SIG Architecture Production Readiness subproject
43+
or to SIG Architecture generally for reassignment at the leads’ discretion.
44+
45+
## Stakeholders
46+
47+
Stakeholders in this working group span multiple SIGs.
48+
49+
In the first phase of defining reliability for Kubernetes building list of
50+
reliability gaps and areas for investments the following SIGs will be
51+
involved:
52+
53+
- SIG Architecture
54+
High-level input on requirements.
55+
- SIG Scalability
56+
Input on scale test gaps and reliability issues at scale.
57+
- SIG Cluster Lifecycle
58+
Input on cluster setup and upgrade mechanics.
59+
- SIG Release
60+
Input on blocking and soak requirements.
61+
- SIG Testing
62+
Input on testing mechanics, missing frameworks, etc.
63+
- SIG *
64+
Input on reliability gaps in their areas.
65+
66+
The group will be also reaching out to users and cluster operator
67+
(e.g. via surveys), to build the full picture.
68+
69+
In the later phase improving reliability, every single SIG may potentially
70+
be involved depending on the findings from the initial phase.
71+
72+
## Deliverables
73+
74+
The artifacts the group is supposed to deliver include:
75+
- Document defining what reliability means for Kubernetes and how to measure it.
76+
- List of known user outages and potential failure modes
77+
- List of specific investmenets that should happen to improve reliability
78+
- Set of processes to introduce in Kubernetes to avoid over time degradation
79+
of reliability
80+
81+
The actual investments will be owned by corresponding SIGs.
82+
83+
## Roles and Organization Management
84+
85+
This sig follows adheres to the Roles and Organization Management outlined in
86+
[sig-governance] and opts-in to updates and modifications to [sig-governance].
87+
88+
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
89+
90+
## Timelines and Disbanding
91+
92+
The exact timeline for existing of this working group is hard to predict at
93+
this time.
94+
95+
The group will start working on the deliverables mentioned above. Once the
96+
group we will be satisfied with the current shape of them and no additional
97+
coordination on their execution will be needed, we will retire Working Group
98+
and pass oversight of reliability to SIG Architecture PRR subproject.

0 commit comments

Comments
 (0)