|
| 1 | +# SIG etcd Vision |
| 2 | + |
| 3 | +The long-term success of the etcd project depends on the following: |
| 4 | +- Etcd is a reliable key-value storage |
| 5 | +- Etcd is simple to operate |
| 6 | +- Etcd is a standalone solution for managing infrastructure |
| 7 | +- Etcd scales beyond Kubernetes dimensions |
| 8 | + |
| 9 | +The goals and milestones listed here are for future releases. |
| 10 | +The scope of release v3.6 has already been defined and is unlikely to change. |
| 11 | + |
| 12 | +## Etcd is a reliable key-value storage service |
| 13 | + |
| 14 | +Reliability remains the most important property of etcd. |
| 15 | +The project cannot allow for another [data inconsistency incident]. |
| 16 | +If we could only pick one thing from the list of goals above, this would be it. |
| 17 | +No matter what features we add in the future, |
| 18 | +they must not diminish etcd's reliability. |
| 19 | +We must establish processes and safeguards to prevent future incidents. |
| 20 | + |
| 21 | +How? |
| 22 | +- Etcd API guarantees are well understood, documented and tested. |
| 23 | +- Etcd adopts a production readiness review process for new features, similar to Kubernetes one. |
| 24 | +- Robustness tests should cover most of the API and most common failures. |
| 25 | +- New features must have accompanying e2e tests and be covered by robustness tests. |
| 26 | +- Etcd must be able to immediately detect corruption. |
| 27 | +- Etcd must be able to automatically recover from data corruption. |
| 28 | + |
| 29 | +[data inconsistency incident]: https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md |
| 30 | + |
| 31 | +## Etcd is simple to operate |
| 32 | + |
| 33 | +Etcd should be easy to operate. |
| 34 | +Currently, there are many steps involved in operating etcd, |
| 35 | +and some of these steps require external tools. |
| 36 | +For example, Kubernetes provides tools to [downgrade/upgrade etcd]. |
| 37 | +These tools are not part of the etcd, |
| 38 | +but they are available as part of the Kubernetes distribution of etcd. |
| 39 | + |
| 40 | +How? |
| 41 | +- Etcd should not require users to run periodic defrag |
| 42 | +- Etcd officially supports live upgrades and downgrades |
| 43 | +- Disaster recovery for Etcd & Kubernetes |
| 44 | +- Reliable cluster membership changes via learners with automated promotion |
| 45 | +- Two node etcd clusters |
| 46 | + |
| 47 | +## Etcd is a standalone solution for managing infrastructure configuration |
| 48 | + |
| 49 | +Kubernetes is not the only way to manage infrastructure. |
| 50 | +It was the first to introduce many concepts that have now become the standard, |
| 51 | +but they are not unique to Kubernetes. |
| 52 | +The most important design principle of Kubernetes, |
| 53 | +the reconciliation protocol, is not something unique to it. |
| 54 | + |
| 55 | +Reconciliation can be implemented solely on etcd, |
| 56 | +as has been shown by projects like Cillium, |
| 57 | +Calico Typha that support etcd-based control planes. |
| 58 | +The reason why this idea has not propagated further is |
| 59 | +the amount of work that was put into making |
| 60 | +the reconciliation protocol scale in Kubernetes. |
| 61 | +The watch cache is a key part of this scaling, |
| 62 | +and it is not part of the etcd project. |
| 63 | + |
| 64 | +If etcd provided a Kubernetes-like storage interface |
| 65 | +and primitives for the reconciliation protocol, |
| 66 | +it would be a more viable solution for managing infrastructure. |
| 67 | +This would allow users to build etcd-based control planes that |
| 68 | +could scale to meet the needs of large and complex deployments. |
| 69 | + |
| 70 | +How? |
| 71 | +- Introduce Kubernetes like storage interface into etcd-client |
| 72 | +- Provide etcd primitives for reconciliation protocol |
| 73 | +- Strip out the Kubernetes watch cache and make it part of the etcd client. |
| 74 | +- Use the watch cache in the client to build an eventually consistent etcd proxy. |
| 75 | + |
| 76 | +[downgrade/upgrade etcd]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd |
| 77 | + |
| 78 | +## Etcd scales beyond Kubernetes dimensions |
| 79 | + |
| 80 | +Etcd has proven its scalability by enabling Kubernetes clusters of up to 5,000 nodes. |
| 81 | +However, as the cloud native ecosystem has evolved, new projects have been built on top of Kubernetes. |
| 82 | +These projects, such as [KCP] (a multi-cluster control plane) and [Kueue] (a batch job queuing system), |
| 83 | +have different scalability requirements than pure Kubernetes. |
| 84 | +For example, they need support for larger storage sizes and higher throughput. |
| 85 | + |
| 86 | +Etcd's strong points are its reliable raft and efficient watch implementation. |
| 87 | +However, its storage capabilities are not as strong. |
| 88 | +To address this, we should look into growing out storage capabilities and making them more flexible depending on the use case. |
| 89 | + |
| 90 | +How? |
| 91 | +- Well-defined and tested scalability dimensions |
| 92 | +- Increase raft throughput (async and batch proposal handling) |
| 93 | +- Increasing bbolt supported storage size |
| 94 | +- Pluggable storage layer |
| 95 | +- Hybrid clusters with write and read optimized members |
| 96 | + |
| 97 | + |
| 98 | +[KCP]: https://cloud.redhat.com/blog/an-introduction-to-kcp |
| 99 | +[Kueue]: https://github.com/kubernetes-sigs/kueue |
| 100 | + |
0 commit comments