|
| 1 | +# KEP-1672: Tracking Terminating Endpoints in the EndpointSlice API |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [User Stories (optional)](#user-stories-optional) |
| 11 | + - [Story 1](#story-1) |
| 12 | + - [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional) |
| 13 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 14 | +- [Design Details](#design-details) |
| 15 | + - [Test Plan](#test-plan) |
| 16 | + - [Graduation Criteria](#graduation-criteria) |
| 17 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 18 | + - [Version Skew Strategy](#version-skew-strategy) |
| 19 | +- [Implementation History](#implementation-history) |
| 20 | +- [Drawbacks](#drawbacks) |
| 21 | +<!-- /toc --> |
| 22 | + |
| 23 | +## Release Signoff Checklist |
| 24 | + |
| 25 | +- [X] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 26 | +- [ ] KEP approvers have approved the KEP status as `implementable` |
| 27 | +- [ ] Design details are appropriately documented |
| 28 | +- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 29 | +- [ ] Graduation criteria is in place |
| 30 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 31 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 32 | +- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 33 | + |
| 34 | +[kubernetes.io]: https://kubernetes.io/ |
| 35 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 36 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 37 | +[kubernetes/website]: https://git.k8s.io/website |
| 38 | + |
| 39 | +## Summary |
| 40 | + |
| 41 | +Today, terminating endpoints are considered "not ready" regardless of their actual readiness. |
| 42 | +Before any work is done in improving how terminating endpoints are handled, there must be a way |
| 43 | +to track whether an endpoint is terminating without having to watch the associated pods. This |
| 44 | +KEP proposes a means to track the terminating state of an endpoint via the EndpointSlice API. |
| 45 | +This would enable consumers of the API to make smarter decisions when it comes to handling |
| 46 | +terminating endpoints (see KEP-1669 as an example). |
| 47 | + |
| 48 | +## Motivation |
| 49 | + |
| 50 | +### Goals |
| 51 | + |
| 52 | +* Provide a mechanism to track whether an endpoint is terminating by only watching the EndpointSlice API. |
| 53 | + |
| 54 | +### Non-Goals |
| 55 | + |
| 56 | +* Consumption of the new API field is out of scope for this KEP but future KEPs will leverage |
| 57 | +the work done here to improve graceful terminination of pods in certain scenarios (see issue [85643](https://github.com/kubernetes/kubernetes/issues/85643)) |
| 58 | + |
| 59 | +## Proposal |
| 60 | + |
| 61 | +This KEP proposes to keep "terminating" pods in the set of endpoints in EndpointSlice with |
| 62 | +additions to the API to indicate whether a given endpoint is terminating or not. If consumers |
| 63 | +of the API (e.g. kube-proxy) are required to treat terminating endpoints differently, they |
| 64 | +may do so by checking this condition. |
| 65 | + |
| 66 | +The criteria for a ready endpoint (pod phase + readiness probe) will not change based on the |
| 67 | +terminating state of pods, but consumers of the API may choose to prefer endpoints that are both ready and not terminating. |
| 68 | + |
| 69 | +### User Stories (optional) |
| 70 | + |
| 71 | +#### Story 1 |
| 72 | + |
| 73 | +A consumer of the EndpointSlice API (e.g. kube-proxy) may want to know which endpoints are |
| 74 | +terminating without having to watch Pods directly for scalability reasons. |
| 75 | + |
| 76 | +One example would be the IPVS proxier which should set the weight of an endpoint to 0 |
| 77 | +during termination and finally remove the real server when the endpoint is removed. |
| 78 | +Without knowing when a pod is done terminating, the IPVS proxy makes a best-effort guess |
| 79 | +at when the pod is terminated by looking at the connection tracking table. |
| 80 | + |
| 81 | +### Notes/Constraints/Caveats (optional) |
| 82 | + |
| 83 | +### Risks and Mitigations |
| 84 | + |
| 85 | +Tracking the terminating state of endpoints poses some scalability concerns as each |
| 86 | +terminating endpoint adds additional writes to the API. Today, a terminating pod |
| 87 | +results in 1 write in Endpoints (removing the endpoint). With the proposed changes, |
| 88 | +each terminating endpoint could result in at least 2 writes (ready -> terminating -> removed) |
| 89 | +and possibly more depending on how many times readiness changes during termination. |
| 90 | + |
| 91 | +## Design Details |
| 92 | + |
| 93 | +To track whether an endpoint is terminating, a `terminating` field would be added as part of |
| 94 | +the `EndpointCondition` type in the EndpointSlice API. |
| 95 | + |
| 96 | +```go |
| 97 | +// EndpointConditions represents the current condition of an endpoint. |
| 98 | +type EndpointConditions struct { |
| 99 | + // ready indicates that this endpoint is prepared to receive traffic, |
| 100 | + // according to whatever system is managing the endpoint. A nil value |
| 101 | + // indicates an unknown state. In most cases consumers should interpret this |
| 102 | + // unknown state as ready. |
| 103 | + // +optional |
| 104 | + Ready *bool `json:"ready,omitempty" protobuf:"bytes,1,name=ready"` |
| 105 | + |
| 106 | + // terminating indicates if this endpoint is terminating. Consumers should assume a |
| 107 | + // nil value indicates the endpoint is not terminating. |
| 108 | + // +optional |
| 109 | + Terminating *bool `json:"terminating,omitempty" protobuf:"bytes,2,name=terminating"` |
| 110 | +} |
| 111 | +``` |
| 112 | + |
| 113 | +NOTE: A nil value for `Terminating` indicates that the endpoint is not terminating. |
| 114 | + |
| 115 | +Updates to endpointslice controller: |
| 116 | +* include pods with a deletion timestamp in endpointslice |
| 117 | +* any pod with a deletion timestamp will have condition.terminating = true |
| 118 | +* allow endpoint ready condition to change during termination |
| 119 | + |
| 120 | +### Test Plan |
| 121 | + |
| 122 | +endpointslice controller unit tests: |
| 123 | +* Unit tests will validate pods with a deletion timestamp are included with condition.teriminating = true |
| 124 | +* Unit tests will validate that the ready condition can change for terminating endpoints |
| 125 | + |
| 126 | +There will be no e2e tests since consumption of this new API is out-of-scope for this KEP. |
| 127 | +Any future KEP that consumes this API should have e2e tests to ensure behavior for terminating |
| 128 | +endpoints is correct. |
| 129 | + |
| 130 | +### Graduation Criteria |
| 131 | + |
| 132 | +Since this is an addition to the EndpointSlice API, graduation will follow the graduation |
| 133 | +timeline for the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md). |
| 134 | + |
| 135 | +### Upgrade / Downgrade Strategy |
| 136 | + |
| 137 | +Since this is an addition to the EndpointSlice API, the upgrade/downgrade strategy will follow that |
| 138 | +of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md). |
| 139 | + |
| 140 | +### Version Skew Strategy |
| 141 | + |
| 142 | +Since this is an addition to the EndpointSlice API, the version skew strategy will follow that |
| 143 | +of the [EndpointSlice API work](/keps/sig-network/20190603-endpointslices/README.md). |
| 144 | + |
| 145 | +## Implementation History |
| 146 | + |
| 147 | +- [x] 2020-04-23: KEP accepted as implementable for v1.19 |
| 148 | + |
| 149 | +## Drawbacks |
| 150 | + |
| 151 | +There are some scalability draw backs as tracking terminating endpoints requires at least 1 additional write per endpoint. |
| 152 | + |
0 commit comments