Create Reliability WG charter

wojtek-t · wojtek-t · commit 74a24a89740c · 2020-09-25T11:00:34.000+02:00
diff --git a/sigs.yaml b/sigs.yaml
@@ -2791,6 +2791,7 @@ workinggroups:
     Allow users to safely use Kubernetes for managing production workloads by ensuring
     Kubernetes is stable and reliable.
 
+  charter_link: charter.md
   stakeholder_sigs:
   - Architecture
   - Cluster Lifecycle
diff --git a/wg-reliability/README.md b/wg-reliability/README.md
@@ -10,6 +10,8 @@ To understand how this file is generated, see https://git.k8s.io/community/gener
 
 Allow users to safely use Kubernetes for managing production workloads by ensuring Kubernetes is stable and reliable.
 
+The [charter](charter.md) defines the scope and governance of the Reliability Working Group.
+
 ## Stakeholder SIGs
 * SIG Architecture
 * SIG Cluster Lifecycle
diff --git a/wg-reliability/charter.md b/wg-reliability/charter.md
@@ -0,0 +1,98 @@
+# WG Reliability Charter
+
+This charter adheres to the conventions described in the [Kubernetes Charter README]
+and uses the Roles and Organization Management outlined in [sig-governance].
+
+[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
+[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
+
+## Scope
+
+The Reliability Working Group (WG Reliability) is organized with the goal of 
+allowing users to safely use Kubernetes for managing production workloads by
+ensuring Kubernetes is stable and reliable.
+
+### In Scope
+
+- What reliability means for Kubernetes and how to measure it?
+- Measuring Kubernetes reliability in tests
+- Introducing criteria for blocking the release if the reliability is
+  below the bar
+- Building a list of end-user outages and reliability issues
+  (if applicable with mitigations and/or workarounds)
+- Creating and prioritizing a list of areas that require reliability
+  investments
+- Work with relevant SIGs on delivering necessary infrastructure
+  (e.g. test frameworks) to unblock further steps
+- Initiate and drive cross-SIG reliability improvements
+
+### Out of scope
+
+- Designing and executing improvements clearly falling into individual SIG
+  responsibilities.
+
+## Special Powers
+
+The Reliability WG has a power to block feature-oriented contributions from
+any SIG if requested reliability-related improvements are not being addressed.
+Before it can be exercised, sig-arch must approve the criteria suggested by
+this working group.
+
+Given WGs are by-definition temporary, on WG Reliability retirement we will
+pass this responsibility to  to SIG Architecture Production Readiness subproject
+or to SIG Architecture generally for reassignment at the leads’ discretion.
+
+## Stakeholders
+
+Stakeholders in this working group span multiple SIGs.
+
+In the first phase of defining reliability for Kubernetes building list of
+reliability gaps and areas for investments the following SIGs will be
+involved:
+
+- SIG Architecture
+  High-level input on requirements.
+- SIG Scalability
+  Input on scale test gaps and reliability issues at scale.
+- SIG Cluster Lifecycle
+  Input on cluster setup and upgrade mechanics.
+- SIG Release
+  Input on blocking and soak requirements.
+- SIG Testing
+  Input on testing mechanics, missing frameworks, etc.
+- SIG *
+  Input on reliability gaps in their areas.
+
+The group will be also reaching out to users and cluster operator
+(e.g. via surveys), to build the full picture.
+
+In the later phase improving reliability, every single SIG may potentially
+be involved depending on the findings from the initial phase.
+
+## Deliverables
+
+The artifacts the group is supposed to deliver include:
+- Document defining what reliability means for Kubernetes and how to measure it.
+- List of known user outages and potential failure modes
+- List of specific investmenets that should happen to improve reliability
+- Set of processes to introduce in Kubernetes to avoid over time degradation
+  of reliability
+
+The actual investments will be owned by corresponding SIGs.
+
+## Roles and Organization Management
+
+This sig follows adheres to the Roles and Organization Management outlined in
+[sig-governance] and opts-in to updates and modifications to [sig-governance].
+
+[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
+
+## Timelines and Disbanding
+
+The exact timeline for existing of this working group is hard to predict at
+this time.
+
+The group will start working on the deliverables mentioned above. Once the
+group we will be satisfied with the current shape of them and no additional
+coordination on their execution will be needed, we will retire Working Group
+and pass oversight of reliability to SIG Architecture PRR subproject.