|
| 1 | +# KEP-4743: The Kubernetes-etcd interface |
| 2 | +<!-- toc --> |
| 3 | +- [Summary](#summary) |
| 4 | + - [Goals](#goals) |
| 5 | + - [Non-Goals](#non-goals) |
| 6 | +- [Proposal](#proposal) |
| 7 | +- [User Stories](#user-stories) |
| 8 | + - [Kubernetes Backporting an etcd client patch version](#kubernetes-backporting-an-etcd-client-patch-version) |
| 9 | + - [Making Changes/Patching Bugs in the Interface](#making-changespatching-bugs-in-the-interface) |
| 10 | + - [Kubernetes Leveraging New etcd Functionality](#kubernetes-leveraging-new-etcd-functionality) |
| 11 | +- [Code location](#code-location) |
| 12 | +- [The Interface](#the-interface) |
| 13 | + - [KV interface](#kv-interface) |
| 14 | + - [Design considerations](#design-considerations) |
| 15 | + - [Watch interface](#watch-interface) |
| 16 | + - [Design considerations](#design-considerations-1) |
| 17 | +- [Alternatives](#alternatives) |
| 18 | + - [Code location](#code-location-1) |
| 19 | + - [Part of the etcd Client Struct](#part-of-the-etcd-client-struct) |
| 20 | + - [New Package in etcd Repository](#new-package-in-etcd-repository) |
| 21 | + - [New Repository under etcd-io](#new-repository-under-etcd-io) |
| 22 | +<!-- /toc --> |
| 23 | + |
| 24 | +## Summary |
| 25 | + |
| 26 | +This design proposal introduces an etcd-Kubernetes interface to be added to the |
| 27 | +etcd client and adopted by Kubernetes. This interface aims to create a clear and |
| 28 | +standardized contract between the two projects, codifying the interactions |
| 29 | +outlined in the [Implicit Kubernetes-ETCD Contract]. By formalizing this contract, |
| 30 | +we will improve the testability of both Kubernetes and etcd, prevent common |
| 31 | +errors in their interaction, and establish a framework for the future evolution |
| 32 | +of this critical contract. |
| 33 | + |
| 34 | +[Implicit Kubernetes-ETCD Contract]: https://docs.google.com/document/d/1NUZDiJeiIH5vo_FMaTWf0JtrQKCx0kpEaIIuPoj9P6A/edit#heading=h.tlkin1a8b8bl |
| 35 | + |
| 36 | +### Goals |
| 37 | + |
| 38 | +* **Improved Testability:** Enable thorough testing of etcd and Kubernetes |
| 39 | + interactions through a well-defined interface, as envisioned in [#15820]. |
| 40 | +* **Error Prevention:** Reduce incorrect contract usage, addressing issues like Kubernetes [#110210]. |
| 41 | +* **Reviewable Changes:** Make contract modifications easily reviewable and |
| 42 | + trackable, ensuring a transparent and collaborative evolution. |
| 43 | +* **Backward Compatibility:** Ensure the interface remains compatible with all |
| 44 | + etcd versions supported by Kubernetes at the time of a Kubernetes release. |
| 45 | + |
| 46 | +[#15820]: https://github.com/etcd-io/etcd/issues/15820 |
| 47 | +[#110210]: https://github.com/kubernetes/kubernetes/issues/110210 |
| 48 | + |
| 49 | +In scope |
| 50 | +* [etcd3 store]: The primary Kubernetes object storage interface. |
| 51 | +* [Master leases]: Lease management for Kubernetes control plane components (utilizing the [etcd3 store]) |
| 52 | + |
| 53 | +[etcd3 store]: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go |
| 54 | +[Master leases]: https://github.com/kubernetes/kubernetes/blob/dae1859c896d742de1ee60a349475f8e28b61995/pkg/controlplane/reconcilers/lease.go#L47-L66 |
| 55 | + |
| 56 | +### Non-Goals |
| 57 | +* **Alternative Interface Implementations:** This KEP focuses solely on defining |
| 58 | + the interface for the existing etcd backend and ensuring its compatibility |
| 59 | + with Kubernetes. It does not encompass the development or support of |
| 60 | + alternative storage backends or implementations for the interface. |
| 61 | +* **Non-Storage Usage of etcd Client:** |
| 62 | + * [Kubeadm] - Primarily used for etcd cluster administration, not Kubernetes object storage. |
| 63 | + * [Compaction] - Kubernetes aims to encourage native etcd compaction. See [#80513]. |
| 64 | + * [Monitor] - Don’t see benefits of standardizing etcd metrics used for Kubernetes, at least for now. |
| 65 | + * [Prober], [Feature checker] - These have planned migrations to native etcd features. See [Design Doc: etcd livez and readyz probes] and [KEP-4647]. |
| 66 | + * [Lease manager] - Planned removal in favor of one lease per key to address [#110210]. |
| 67 | + |
| 68 | +[Kubeadm]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/cmd/kubeadm/app/util/etcd/etcd.go#L66-L93 |
| 69 | +[Compaction]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/staging/src/k8s.io/apiserver/pkg/storage/etcd3/compact.go#L133-L162 |
| 70 | +[#80513]: https://github.com/kubernetes/kubernetes/issues/80513 |
| 71 | +[Monitor]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L270-L283 |
| 72 | +[Prober]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L256-L268 |
| 73 | +[Feature checker]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/staging/src/k8s.io/apiserver/pkg/storage/feature/feature_support_checker.go#L143-L151 |
| 74 | +[Design Doc: etcd livez and readyz probes]: https://docs.google.com/document/d/1SkzmO4RT_GI9YhT0dw4a6nEwKVCciwrwbCDxK0D7ASM/edit?usp=sharing |
| 75 | +[KEP-4647]: https://github.com/kubernetes/enhancements/pull/4662 |
| 76 | +[Lease manager]: https://github.com/kubernetes/kubernetes/blob/6ba9fa89fb5889550649bfde847c742a55d3d29c/staging/src/k8s.io/apiserver/pkg/storage/etcd3/lease_manager.go#L90-L120 |
| 77 | +[#110210]: https://github.com/kubernetes/kubernetes/issues/110210 |
| 78 | + |
| 79 | +## Proposal |
| 80 | + |
| 81 | +This KEP proposes creating an etcd-Kubernetes code interface owned and |
| 82 | +maintained by SIG-etcd. The interface will serve as a formalization of the |
| 83 | +existing etcd-Kubernetes contract, ensuring the correct usage of etcd within |
| 84 | +Kubernetes and enabling improved testing and validation. |
| 85 | + |
| 86 | +The interface will prioritize etcd's existing capabilities and behaviors, |
| 87 | +focusing on compatibility with the current etcd API. It will not introduce |
| 88 | +features or behaviors not supported by etcd, adhering to the existing |
| 89 | +SIG API Machinery policy outlined in [Storage for Extension API Servers]. |
| 90 | +This policy designates etcd as the sole supported storage backend for Kubernetes |
| 91 | +for the foreseeable future. |
| 92 | + |
| 93 | +[Storage for Extension API Servers]: https://docs.google.com/document/d/1i0xzRFB-uGLmLYueLMBTpHrOot9ScFxpkkcVcZHVbyA/edit?usp=sharing] |
| 94 | + |
| 95 | +## User Stories |
| 96 | + |
| 97 | +To better understand the importance of code location let’s visit the following use cases: |
| 98 | + |
| 99 | +### Kubernetes Backporting an etcd client patch version |
| 100 | + |
| 101 | +**The Journey:** Kubernetes regularly updates to newer etcd versions to leverage |
| 102 | +bug fixes, or security patches. However, ensuring compatibility between the |
| 103 | +codified etcd-Kubernetes interface and etcd client is essential. |
| 104 | + |
| 105 | +**Considerations:** |
| 106 | + |
| 107 | +* Even minor etcd client updates might inadvertently introduce changes that |
| 108 | + break the interface's assumptions or functionality. |
| 109 | +* Tight coupling to the etcd client could necessitate backporting the |
| 110 | + interface to older etcd branches, a complex and time-consuming process. |
| 111 | + |
| 112 | + |
| 113 | +### Making Changes/Patching Bugs in the Interface |
| 114 | + |
| 115 | +**The Journey:** Despite careful design, the complex etcd-Kubernetes contract |
| 116 | +might reveal bugs or require adjustments. |
| 117 | + |
| 118 | +**Considerations:** |
| 119 | + |
| 120 | +* Changes and bug fixes need to be implemented and released with minimal |
| 121 | + disruption to both Kubernetes and etcd users. |
| 122 | +* Tightly coupled interface might require a full etcd |
| 123 | + client release for bug fixes, slowing down the process. |
| 124 | + |
| 125 | +### Kubernetes Leveraging New etcd Functionality |
| 126 | + |
| 127 | +**The Journey:** Kubernetes wants to expand its use of the etcd API beyond the |
| 128 | +current interface's scope. |
| 129 | + |
| 130 | +**Considerations:** The interface codifies a minimal subset of the etcd API |
| 131 | +currently used by Kubernetes, and new features will initially be outside its |
| 132 | +scope. Balancing new feature adoption with interface stability is crucial. |
| 133 | + |
| 134 | +**Mitigation:** |
| 135 | + |
| 136 | +* Allow Kubernetes to directly use new etcd client features during alpha/beta |
| 137 | + stages, bypassing the interface temporarily. |
| 138 | +* Extending etcd robustness test to cover the new functionality before |
| 139 | + formalizing them in interface. |
| 140 | +* Once a feature is mature and stable, extend the interface, ensuring backward |
| 141 | + compatibility for existing Kubernetes versions. |
| 142 | + |
| 143 | +## Code location |
| 144 | + |
| 145 | +We propose locating the interface in a `kubernetes` subdirectory under |
| 146 | +https://github.com/etcd-io/etcd/tree/main/client/v3. |
| 147 | +This approach allows for seamless integration with the etcd client while |
| 148 | +maintaining a dedicated space for the interface code. |
| 149 | + |
| 150 | +Interface will be part of the etcd client package and it's release will be |
| 151 | +combined with etcd release. For immediate Kubernetes use etcd will backport the |
| 152 | +client to `release-3.5` branch and introduce it in next etcd patch release for |
| 153 | +Kubernetes to consume. |
| 154 | + |
| 155 | +Alternative code locations are discussed at the end of the document. |
| 156 | + |
| 157 | +## The Interface |
| 158 | + |
| 159 | +To ensure smoother transition we propose the adoption of the etcd-Kubernetes interface to be done in two stages: |
| 160 | + |
| 161 | +1. **KV Interface:** Covering the basic get, list, count, put, and delete operations. |
| 162 | +2. **Watch Interface: **Covering Watch operation and requesting progress notification for it. |
| 163 | + |
| 164 | + |
| 165 | +### KV interface |
| 166 | + |
| 167 | +For the reasoning please see the section below. |
| 168 | + |
| 169 | +``` |
| 170 | +// Interface defines the minimal client-side interface that Kubernetes requires |
| 171 | +// to interact with etcd. Methods below are standard etcd operations with |
| 172 | +// semantics adjusted to better suit Kubernetes' needs. |
| 173 | +type Interface interface { |
| 174 | + // Get retrieves a single key-value pair from etcd. |
| 175 | + // |
| 176 | + // If opts.Revision is set to a non-zero value, the key-value pair is retrieved at the specified revision. |
| 177 | + // If the required revision has been compacted, the request will fail with ErrCompacted. |
| 178 | + Get(ctx context.Context, key string, opts GetOptions) (GetResponse, error) |
| 179 | +
|
| 180 | + // List retrieves key-value pairs with the specified prefix. |
| 181 | + // |
| 182 | + // If opts.Revision is non-zero, the key-value pairs are retrieved at the specified revision. |
| 183 | + // If the required revision has been compacted, the request will fail with ErrCompacted. |
| 184 | + // If opts.Limit is greater than zero, the number of returned key-value pairs is bounded by the limit. |
| 185 | + // If opts.Continue is not empty, the listing will start from the key immediately after the one specified by Continue. |
| 186 | + List(ctx context.Context, prefix string, opts ListOptions) (ListResponse, error) |
| 187 | +
|
| 188 | + // Count returns the number of keys with the specified prefix. |
| 189 | + Count(ctx context.Context, prefix string) (int64, error) |
| 190 | +
|
| 191 | + // OptimisticPut creates or updates a key-value pair if the key has not been modified or created |
| 192 | + // since the revision specified in expectedRevision. Otherwise, it updates the key-value pair |
| 193 | + // only if it hasn't been modified since expectedRevision. |
| 194 | + // |
| 195 | + // If opts.GetOnFailure is true, the modified key-value pair will be returned if the put operation fails due to a revision mismatch. |
| 196 | + // If opts.LeaseID is provided, it overrides the lease associated with the key. If not provided, the existing lease is cleared. |
| 197 | + OptimisticPut(ctx context.Context, key string, value []byte, expectedRevision int64, opts PutOptions) (PutResponse, error) |
| 198 | +
|
| 199 | + // OptimisticDelete deletes the key-value pair if it hasn't been modified since the revision |
| 200 | + // specified in expectedRevision. |
| 201 | + // |
| 202 | + // If opts.GetOnFailure is true, the modified key-value pair will be returned if the delete operation fails due to a revision mismatch. |
| 203 | + OptimisticDelete(ctx context.Context, key string, expectedRevision int64, opts DeleteOptions) (DeleteResponse, error) |
| 204 | +} |
| 205 | +
|
| 206 | +type GetOptions struct { |
| 207 | + Revision int64 |
| 208 | +} |
| 209 | +
|
| 210 | +type ListOptions struct { |
| 211 | + Revision int64 |
| 212 | + Limit int64 |
| 213 | + Continue string |
| 214 | +} |
| 215 | +
|
| 216 | +type PutOptions struct { |
| 217 | + GetOnFailure bool |
| 218 | + // LeaseID |
| 219 | + // Deprecated: Should be replaced with TTL when Interface starts using one lease per object. |
| 220 | + LeaseID clientv3.LeaseID |
| 221 | +} |
| 222 | +
|
| 223 | +type DeleteOptions struct { |
| 224 | + GetOnFailure bool |
| 225 | +} |
| 226 | +
|
| 227 | +type GetResponse struct { |
| 228 | + KV *mvccpb.KeyValue |
| 229 | + Revision int64 |
| 230 | +} |
| 231 | +
|
| 232 | +type ListResponse struct { |
| 233 | + KVs []*mvccpb.KeyValue |
| 234 | + Count int64 |
| 235 | + Revision int64 |
| 236 | +} |
| 237 | +
|
| 238 | +type PutResponse struct { |
| 239 | + KV *mvccpb.KeyValue |
| 240 | + Succeeded bool |
| 241 | + Revision int64 |
| 242 | +} |
| 243 | +
|
| 244 | +type DeleteResponse struct { |
| 245 | + KV *mvccpb.KeyValue |
| 246 | + Succeeded bool |
| 247 | + Revision int64 |
| 248 | +} |
| 249 | +
|
| 250 | +
|
| 251 | +``` |
| 252 | + |
| 253 | +### Design considerations |
| 254 | + |
| 255 | +**How should arguments be passed?** Proposed: Options struct. |
| 256 | + |
| 257 | +* It’s more extensible than a hardcoded list of arguments, allowing adding more fields in future. |
| 258 | +* It’s more readable than the variadic options list when arguments are optional. |
| 259 | + Take a server code to manage [list limit options] as an example. |
| 260 | +* Same arguments apply for response struct. |
| 261 | + |
| 262 | +[list limit options]: https://github.com/kubernetes/kubernetes/blob/97e87e2c40e5b83399a44738d38653fd59c58e99/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L640-L645 |
| 263 | + |
| 264 | +**Prefer Range vs List semantics?** Proposed: List |
| 265 | + |
| 266 | +* List matches the intention of the Kubernetes behavior |
| 267 | + |
| 268 | +**Combine Create and Update?** Proposed: Combine them into Put |
| 269 | + |
| 270 | +* They are the same from an argument standpoint. Create is a Update with ExpectedRevision set to 0. |
| 271 | +* The difference in on failure can be solved by optional argument `GetOnFailure` |
| 272 | + |
| 273 | + |
| 274 | +### Watch interface |
| 275 | + |
| 276 | +For the reasoning please see the section below. |
| 277 | +``` |
| 278 | +
|
| 279 | +type Kubernetes interface { |
| 280 | + Watch(ctx context.Context, key string, opts WatchOptions) KubernetesWatchChan |
| 281 | + RequestProgress(ctx context.Context, opts RequestProgressOptions) error |
| 282 | +} |
| 283 | +
|
| 284 | +type WatchOptions struct { |
| 285 | + StreamKey string |
| 286 | + Revision int64 |
| 287 | + Prefix bool |
| 288 | +} |
| 289 | +
|
| 290 | +type RequestProgressOptions struct { |
| 291 | + StreamKey string |
| 292 | +} |
| 293 | +
|
| 294 | +type KubernetesWatchChan <-chan KubernetesWatchEvent |
| 295 | +
|
| 296 | +type KubernetesEventType string |
| 297 | +
|
| 298 | +const ( |
| 299 | + Added KubernetesEventType = "ADDED" |
| 300 | + Modified KubernetesEventType = "MODIFIED" |
| 301 | + Deleted KubernetesEventType = "DELETED" |
| 302 | + Bookmark KubernetesEventType = "BOOKMARK" |
| 303 | + Error KubernetesEventType = "ERROR" |
| 304 | +) |
| 305 | +
|
| 306 | +type KubernetesWatchEvent struct { |
| 307 | + Type KubernetesEventType |
| 308 | +
|
| 309 | + Error error |
| 310 | + Revision int64 |
| 311 | + Key string |
| 312 | + Value []byte |
| 313 | + PreviousValue []byte |
| 314 | +} |
| 315 | +``` |
| 316 | + |
| 317 | +### Design considerations |
| 318 | + |
| 319 | +**What control does the user have over requesting progress?** Proposed: Allow user to set streamKey when create watch and requesting progress |
| 320 | + |
| 321 | +* StreamKey is used to separate watch grpc streams. |
| 322 | + For Kubernetes we always use one stream as we don’t change grpc metadata between requests (e.g. WithRequireLeader). |
| 323 | + Currently etcd client doesn’t expose streamKey to the user, just calculates it based on grpc metadata taken from context. |
| 324 | +* Having access to streamKey is useful as progress notifications cannot be requested on a per watch basis, only for the whole stream. |
| 325 | + This isn’t a big problem in their current setup as Kubernetes opens only one watch per resource. However, this would become a scalability issue for CRDs. |
| 326 | + |
| 327 | +**Should Kubernetes explicitly pass WithRequireLeader or make it default?** Proposed: Make it default if Kubernetes interface is used. |
| 328 | + |
| 329 | +**Should we wrap the watch response?** Proposed: Yes, it allows us to codify the Kubernetes dependency on single revision per transaction and PrevKV dependency. |
| 330 | + |
| 331 | +## Alternatives |
| 332 | + |
| 333 | +### Code location |
| 334 | + |
| 335 | +#### Part of the etcd Client Struct |
| 336 | + |
| 337 | +**Pros:** |
| 338 | + |
| 339 | +* **Seamless Integration:** The interface becomes inherently part of the client, fostering intuitive usage. |
| 340 | +* **Code Reuse:** Leverage existing private client methods, reducing redundancy. |
| 341 | + |
| 342 | +**Cons:** |
| 343 | + |
| 344 | +* **Tight Coupling:** Changes to the interface necessitate updates to the entire etcd client, impacting Kubernetes upgrades. |
| 345 | +* **Limited Autonomy:** Release and bug-fix cycles are bound to the etcd project's schedule, which may not align with Kubernetes' needs. |
| 346 | +* **Backporting Challenge:** Requires backporting to v3.5 for Kubernetes compatibility, going against the etcd project's goal of minimizing backports. |
| 347 | + |
| 348 | +#### New Package in etcd Repository |
| 349 | + |
| 350 | +**Pros:** |
| 351 | + |
| 352 | +* **Versioning Flexibility:** Allows for independent versioning (e.g., `v3.5.13-interface.1`) to track interface changes separately from the etcd client. |
| 353 | +* **Manageable Integration:** Separates the interface from the client but keeps it within the etcd project, simplifying coordination. |
| 354 | + |
| 355 | +**Cons:** |
| 356 | + |
| 357 | +* **Backporting Challenge: **Still requires backporting to v3.5 for initial Kubernetes compatibility. |
| 358 | +* **Maintenance Overhead:** Separate versioning introduces some additional maintenance effort to ensure compatibility between the interface and etcd versions. |
| 359 | +* **Compatibility Risk:** Incompatibilities may arise between etcd and interface versions if not managed meticulously. |
| 360 | + |
| 361 | + |
| 362 | +#### New Repository under etcd-io |
| 363 | + |
| 364 | +**Pros:** |
| 365 | + |
| 366 | +* **Maximum Autonomy:** Grants Kubernetes full control over development, releases, and bug fixes. |
| 367 | + |
| 368 | +**Cons:** |
| 369 | + |
| 370 | +* **Increased Overhead:** Demands significant effort for maintenance, versioning, and compatibility across etcd client versions. |
| 371 | +* **Dependency Management:** Introduces an additional dependency for Kubernetes, increasing the complexity of version management. |
| 372 | +* **Potential for Code Duplication:** Implementing the interface might necessitate changes to internal client behavior, potentially requiring some code to be copied. |
0 commit comments