set up SIG-etcd (#7372)

Han Kang · jberkus · serathius · web-flow · commit 3edd1c6d486a · 2023-09-11T17:42:11.000-07:00
* Create SIG-etcd Co-authored-by: Josh Berkus <josh@agliodbs.com> Co-authored-by: Marek Siarkowicz <siarkowicz@google.com> * Remove unnecessary eviations * Update sig-etcd/charter.md Co-authored-by: Benjamin Wang <wachao@vmware.com> * Update charter.md to remove OWNERS file from deviation Adding OWNERS file will be a hard requirement for etcd repo. I also added an issue in etcd repo for tracking: etcd-io/etcd#16367. * update charter.md, vision.md and README.md to address comments * update sig-etcd with new chairs * Update charter.md * Update charter.md to include implicit k8s-etcd-contract as part of sig-etcd's responsibility in a sentence, instead of a linked google doc * Update sig-etcd/vision.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update sig-etcd/vision.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * update etcd meeting link, time and youtube link --------- Co-authored-by: Josh Berkus <josh@agliodbs.com> Co-authored-by: Marek Siarkowicz <siarkowicz@google.com> Co-authored-by: Wenjia <wenjiazhang@google.com> Co-authored-by: Benjamin Wang <wachao@vmware.com> Co-authored-by: Tim Bannister <tim@scalefactory.com>
diff --git a/OWNERS_ALIASES b/OWNERS_ALIASES
@@ -54,6 +54,11 @@ aliases:
     - reylejano
     - sftim
     - tengqm
+  sig-etcd-leads:
+    - ahrtr
+    - jmhbnz
+    - serathius
+    - wenjiaswe
   sig-instrumentation-leads:
     - dashpole
     - dgrisonnet
diff --git a/liaisons.md b/liaisons.md
@@ -40,6 +40,7 @@ members will assume one of the departing members groups.
 | [SIG Cluster Lifecycle](sig-cluster-lifecycle/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |
 | [SIG Contributor Experience](sig-contributor-experience/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
 | [SIG Docs](sig-docs/README.md) | Carlos Tadeu Panato Jr. (**[@cpanato](https://github.com/cpanato)**) |
+| [SIG etcd](sig-etcd/README.md) | TBD (**[@TBD](https://github.com/TBD)**) |
 | [SIG Instrumentation](sig-instrumentation/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
 | [SIG K8s Infra](sig-k8s-infra/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) |
 | [SIG Multicluster](sig-multicluster/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
diff --git a/sig-etcd/OWNERS b/sig-etcd/OWNERS
@@ -0,0 +1,8 @@
+# See the OWNERS docs at https://go.k8s.io/owners
+
+reviewers:
+  - sig-etcd-leads
+approvers:
+  - sig-etcd-leads
+labels:
+  - sig/etcd
diff --git a/sig-etcd/README.md b/sig-etcd/README.md
@@ -0,0 +1,118 @@
+<!---
+This is an autogenerated file!
+
+Please do not edit this file directly, but instead make changes to the
+sigs.yaml file in the project root.
+
+To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
+--->
+# etcd Special Interest Group
+
+etcd is a production-ready store for building cloud-native distributed systems and managing cloud-native infrastructure via orchestrators like Kubernetes.
+Etcd should provide distributed system primitives** (such as distributed locking and leader election) that allow users to **create scalable, highly available and fault-tolerant systems.
+Etcd is the place to store the infrastructure configuration, not only as part of Kubernetes, but also as a standalone solution.
+
+The [charter](charter.md) defines the scope and governance of the etcd Special Interest Group.
+
+## Meetings
+*Joining the [mailing list](https://groups.google.com/g/etcd-dev) for the group will typically add invites for the following meetings to your calendar.*
+* Regular SIG Meeting: [Thursdays at 11:00 PT (Pacific Time)](https://zoom.us/my/cncfetcdproject) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=11:00&tz=PT%20%28Pacific%20Time%29).
+  * [Meeting notes and Agenda](https://docs.google.com/document/d/16XEGyPBisZvmmoIHSZzv__LoyOeluC5a4x353CX0SIM/edit?usp=sharing).
+  * [Meeting recordings](https://www.youtube.com/playlist?list=PLRGL688DpO9rtufHbiunuCHddYY6MGkwW).
+
+## Leadership
+
+### Chairs
+The Chairs of the SIG run operations and processes governing the SIG.
+
+* James Blair (**[@jmhbnz](https://github.com/jmhbnz)**), Red Hat
+* Wenjia Zhang (**[@wenjiaswe](https://github.com/wenjiaswe)**), Google
+
+### Technical Leads
+The Technical Leads of the SIG establish new subprojects, decommission existing
+subprojects, and resolve cross-subproject technical issues and decisions.
+
+* Benjamin Wang (**[@ahrtr](https://github.com/ahrtr)**), VMWare
+* Marek Siarkowicz (**[@serathius](https://github.com/serathius)**), Google
+
+## Contact
+- Slack: [#etcd](https://kubernetes.slack.com/messages/etcd)
+- [Mailing list](https://groups.google.com/g/etcd-dev)
+- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fetcd)
+- GitHub Teams:
+    - [@kubernetes/sig-etcd-leads](https://github.com/orgs/kubernetes/teams/sig-etcd-leads) - SIG Chairs and Tech Leads
+- Steering Committee Liaison: TBD (**[@TBD](https://github.com/TBD)**)
+
+## Subprojects
+
+The following [subprojects][subproject-definition] are owned by sig-etcd:
+### bbolt
+An embedded key/value database for Go.
+- **Owners:**
+  - [etcd-io/bbolt/MAINTAINERS](https://github.com/etcd-io/bbolt/blob/master/MAINTAINERS)
+### cetcd
+Serve Consul with etcd
+- **Owners:**
+  - [etcd-io/cetcd/MAINTAINERS](https://github.com/etcd-io/cetcd/blob/master/MAINTAINERS)
+### dbtester
+Distributed database benchmark tester
+- **Owners:**
+  - [etcd-io/dbtester/MAINTAINERS](https://github.com/etcd-io/dbtester/blob/master/MAINTAINERS)
+### discovery.etcd.io
+Kubernetes manifests powering discovery.etcd.io
+- **Owners:**
+  - [etcd-io/discovery.etcd.io/MAINTAINERS](https://github.com/etcd-io/discovery.etcd.io/blob/master/MAINTAINERS)
+### discoveryserver
+Public etcd Discovery Service
+- **Owners:**
+  - [etcd-io/discoveryserver/MAINTAINERS](https://github.com/etcd-io/discoveryserver/blob/master/MAINTAINERS)
+### etcd
+Distributed reliable key-value store for the most critical data of a distributed system
+- **Owners:**
+  - [etcd-io/etcd/MAINTAINERS](https://github.com/etcd-io/etcd/blob/master/MAINTAINERS)
+### etcd-play
+etcd playground
+- **Owners:**
+  - [etcd-io/etcd-play/MAINTAINERS](https://github.com/etcd-io/etcd-play/blob/master/MAINTAINERS)
+### etcdlabs
+etcd playground
+- **Owners:**
+  - [etcd-io/etcdlabs/MAINTAINERS](https://github.com/etcd-io/etcdlabs/blob/master/MAINTAINERS)
+### gofail
+failpoints for go
+- **Owners:**
+  - [etcd-io/gofail/MAINTAINERS](https://github.com/etcd-io/gofail/blob/master/MAINTAINERS)
+### govanityurls
+Use a custom domain in your Go import path
+- **Owners:**
+  - [etcd-io/govanityurls/MAINTAINERS](https://github.com/etcd-io/govanityurls/blob/master/MAINTAINERS)
+### jetcd
+etcd java client
+- **Owners:**
+  - [etcd-io/jetcd/MAINTAINERS](https://github.com/etcd-io/jetcd/blob/master/MAINTAINERS)
+### maintainers
+issue tracking for project wide non-code concerns
+- **Owners:**
+  - [etcd-io/maintainers/MAINTAINERS](https://github.com/etcd-io/maintainers/blob/master/MAINTAINERS)
+### protodoc
+protodoc generates Protocol Buffer documentation.
+- **Owners:**
+  - [etcd-io/protodoc/MAINTAINERS](https://github.com/etcd-io/protodoc/blob/master/MAINTAINERS)
+### raft
+Raft library for maintaining a replicated state machine
+- **Owners:**
+  - [etcd-io/raft/MAINTAINERS](https://github.com/etcd-io/raft/blob/master/MAINTAINERS)
+### website
+etcd-io
+- **Owners:**
+  - [etcd-io/website/MAINTAINERS](https://github.com/etcd-io/website/blob/master/MAINTAINERS)
+### zetcd
+Serve the Apache Zookeeper API but back it with an etcd cluster
+- **Owners:**
+  - [etcd-io/zetcd/MAINTAINERS](https://github.com/etcd-io/zetcd/blob/master/MAINTAINERS)
+
+[subproject-definition]: https://github.com/kubernetes/community/blob/master/governance.md#subprojects
+[working-group-definition]: https://github.com/kubernetes/community/blob/master/governance.md#working-groups
+<!-- BEGIN CUSTOM CONTENT -->
+
+<!-- END CUSTOM CONTENT -->
diff --git a/sig-etcd/charter.md b/sig-etcd/charter.md
@@ -0,0 +1,63 @@
+# SIG etcd Charter
+
+This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
+the Roles and Organization Management outlined in [sig-governance].
+
+[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
+[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md
+
+## Scope
+
+Owns the etcd project and how it is used by Kubernetes.
+
+### In scope
+
+#### Code, Binaries and Services
+
+- Development of [etcd] and other repositories under [etcd-io organization]
+- Maintenance of [etcd image] packaged with Kubernetes
+
+[etcd]: https://github.com/etcd-io/etcd
+[etcd-io organization]: https://github.com/etcd-io
+[etcd image]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd
+
+#### Cross-cutting and Externally Facing Processes
+
+- Specifying, testing and improving the implicit Kubernetes-ETCD Contract, which includes storage requirements, write and delete requirements, read requirements and watch requirements.
+- Release process of etcd and other binaries belonging to [etcd-io organization]
+
+### Out of scope
+
+- Structure of data stored in etcd by Kubernetes components is owned by SIG API Machinery
+
+## Roles and Organization Management
+
+This SIG follows the Roles and Organization Management outlined in [sig-governance]
+and opts-in to updates and modifications to [sig-governance].
+
+### Additional responsibilities of Tech Leads
+
+- Release of etcd and other binaries belonging to [etcd-io organization]
+
+### Deviations from [sig-governance]
+
+- SIG etcd's participation in the Kubernetes release cycle is limited by etcd having a different schedule for its releases.
+- SIG etcd communication utilizes pre-existing forums for communication:
+  - Email: [etcd-dev](https://groups.google.com/forum/?hl=en#!forum/etcd-dev).
+  - Slack: [#etcd](https://kubernetes.slack.com/messages/C3HD8ARJ5/details/) channel on Kubernetes.
+- SIG etcd contributing instructions ([CONTRIBUTING.md]) be defined in etcd project.
+
+[CONTRIBUTING.md]: https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md
+
+### Deviations from [kubernetes-repositories]
+
+- SIG etcd repositories live in github.com/etcd-io
+- SIG etcd repositories should (but not must) adopt merge bot, Kubernetes PR commands/bot.
+- SIG etcd repositories will follow [rules for donated repositories].
+
+[kubernetes-repositories]: https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#sig-repositories
+[rules for donated repositories]:  https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories
+
+### Subproject Creation
+
+By SIG Technical Leads
diff --git a/sig-etcd/vision.md b/sig-etcd/vision.md
@@ -0,0 +1,100 @@
+# SIG etcd Vision
+
+The long-term success of the etcd project depends on the following:
+- Etcd is a reliable key-value storage
+- Etcd is simple to operate
+- Etcd is a standalone solution for managing infrastructure
+- Etcd scales beyond Kubernetes dimensions
+
+The goals and milestones listed here are for future releases. 
+The scope of release v3.6 has already been defined and is unlikely to change.
+
+## Etcd is a reliable key-value storage service
+
+Reliability remains the most important property of etcd.
+The project cannot allow for another [data inconsistency incident].
+If we could only pick one thing from the list of goals above, this would be it.
+No matter what features we add in the future, 
+they must not diminish etcd's reliability. 
+We must establish processes and safeguards to prevent future incidents.
+
+How?
+- Etcd API guarantees are well understood, documented and tested.
+- Etcd adopts a production readiness review process for new features, similar to Kubernetes one.
+- Robustness tests should cover most of the API and most common failures.
+- New features must have accompanying e2e tests and be covered by robustness tests.
+- Etcd must be able to immediately detect corruption.
+- Etcd must be able to automatically recover from data corruption.
+ 
+[data inconsistency incident]: https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md
+
+## Etcd is simple to operate
+
+Etcd should be easy to operate.
+Currently, there are many steps involved in operating etcd,
+and some of these steps require external tools. 
+For example, Kubernetes provides tools to [downgrade/upgrade etcd].
+These tools are not part of the etcd,
+but they are available as part of the Kubernetes distribution of etcd.
+
+How?
+- Etcd should not require users to run periodic defrag
+- Etcd officially supports live upgrades and downgrades
+- Disaster recovery for Etcd & Kubernetes
+- Reliable cluster membership changes via learners with automated promotion
+- Two node etcd clusters
+
+## Etcd is a standalone solution for managing infrastructure configuration
+
+Kubernetes is not the only way to manage infrastructure.
+It was the first to introduce many concepts that have now become the standard,
+but they are not unique to Kubernetes.
+The most important design principle of Kubernetes,
+the reconciliation protocol, is not something unique to it.
+
+Reconciliation can be implemented solely on etcd,
+as has been shown by projects like Cillium,
+Calico Typha that support etcd-based control planes.
+The reason why this idea has not propagated further is
+the amount of work that was put into making 
+the reconciliation protocol scale in Kubernetes.
+The watch cache is a key part of this scaling,
+and it is not part of the etcd project.
+
+If etcd provided a Kubernetes-like storage interface
+and primitives for the reconciliation protocol,
+it would be a more viable solution for managing infrastructure.
+This would allow users to build etcd-based control planes that
+could scale to meet the needs of large and complex deployments.
+
+How?
+- Introduce Kubernetes like storage interface into etcd-client
+- Provide etcd primitives for reconciliation protocol
+- Strip out the Kubernetes watch cache and make it part of the etcd client.
+- Use the watch cache in the client to build an eventually consistent etcd proxy.
+ 
+[downgrade/upgrade etcd]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd
+
+## Etcd scales beyond Kubernetes dimensions
+
+Etcd has proven its scalability by enabling Kubernetes clusters of up to 5,000 nodes.
+However, as the cloud native ecosystem has evolved, new projects have been built on top of Kubernetes.
+These projects, such as [KCP] (a multi-cluster control plane) and [Kueue] (a batch job queuing system),
+have different scalability requirements than pure Kubernetes.
+For example, they need support for larger storage sizes and higher throughput.
+
+Etcd's strong points are its reliable raft and efficient watch implementation.
+However, its storage capabilities are not as strong.
+To address this, we should look into growing out storage capabilities and making them more flexible depending on the use case.
+
+How?
+- Well-defined and tested scalability dimensions
+- Increase raft throughput (async and batch proposal handling)
+- Increasing bbolt supported storage size
+- Pluggable storage layer
+- Hybrid clusters with write and read optimized members
+
+
+[KCP]: https://cloud.redhat.com/blog/an-introduction-to-kcp
+[Kueue]: https://github.com/kubernetes-sigs/kueue
+
diff --git a/sig-list.md b/sig-list.md
diff --git a/sigs.yaml b/sigs.yaml