Skip to content

Commit a9a98bd

Browse files
committed
more details
1 parent 0b4ebf4 commit a9a98bd

File tree

1 file changed

+49
-29
lines changed
  • keps/sig-api-machinery/3903-unknown-version-interoperability-proxy

1 file changed

+49
-29
lines changed

keps/sig-api-machinery/3903-unknown-version-interoperability-proxy/README.md

Lines changed: 49 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -190,8 +190,10 @@ incorrectly or objects being garbage collected mistakenly.
190190

191191
## Proposal
192192

193-
API change: To the apiservices API, add an "alternates" clause, a list of
194-
apiservers which believe they can serve the group-version.
193+
API changes:
194+
* To the apiservices API, add an "alternates" clause, a list of
195+
apiservers which believe they can serve the group-version.
196+
* To ??? API, add ability to tell which apiservers can serve a resource.
195197

196198
API server change:
197199
* A controller adds the apiserver to the list of alternates for its built-in
@@ -202,22 +204,34 @@ API server change:
202204
- If the request is for a group/version the apiserver doesn't have locally, it
203205
will proxy the request to one of the alternates instead.
204206

205-
Unsolved problem: to be completely accurate and achive the goals in this KEP, we
206-
will need to track what resources apiservers can serve, not just what
207-
group-versions.
208-
209207
### User Stories (Optional)
210208

211-
<!--
212-
Detail the things that people will be able to do if this KEP is implemented.
213-
Include as much detail as possible so that people can understand the "how" of
214-
the system. The goal here is to make this feel real for users without getting
215-
bogged down.
216-
-->
209+
#### Garbage Collector
210+
211+
The garbage collector makes decisions about deleting objects when all
212+
referencing objects are deleted. A discovery gap / apiserver mismatch, as
213+
described above, could result in GC seeing a 404 and assuming an object has been
214+
deleted; this could result in it deleting a subsequent object that it should
215+
not.
217216

218-
#### Story 1
217+
This proposal will cause the GC to see either the correct object or get a 503
218+
(which it handles safely).
219219

220-
#### Story 2
220+
#### Namespace Lifecycle Controller
221+
222+
This controller seeks to empty all objects from a namespace when it is deleted.
223+
Discovery failures cause NLC to be unable to tell if objects of a given resource
224+
are present in a namespace. It fails safe, meaning it refuses to delete the
225+
namespace until it can verify it is empty: this causes slowness deleteing
226+
namespaces that is a common source of complaint.
227+
228+
Additionally, if the NLC knows about a resource that the apiserver it is talking
229+
to does not, it may incorrectly get a 404, assume a collection is empty, and
230+
delete the namespace too early, leaving garbage behind in etcd. This is a
231+
correctness problem, the garbage will reappear if a namespace of the same name
232+
is recreated.
233+
234+
This proposal addresses both problems.
221235

222236
### Notes/Constraints/Caveats (Optional)
223237

@@ -230,26 +244,32 @@ This might be a good place to talk about core concepts and how they relate.
230244

231245
### Risks and Mitigations
232246

233-
<!--
234-
What are the risks of this proposal, and how do we mitigate? Think broadly.
235-
For example, consider both security and how this will impact the larger
236-
Kubernetes ecosystem.
247+
Cluster admins might not read the release notes and realize they should enable
248+
network/firewall connectivity between apiservers. In this case clients will
249+
recieve 503s instead of transparently being proxied. 503 is still safer than
250+
today's behavior.
237251

238-
How will security be reviewed, and by whom?
252+
Requests will consume egress bandwidth for 2 apiservers when proxied. We can cap
253+
the number if needed, but upgrades aren't that frequent and few resources are
254+
changed on releases, so these requests should not be common. We will count them
255+
with a metric.
239256

240-
How will UX be reviewed, and by whom?
241-
242-
Consider including folks who also work outside the SIG or subproject.
243-
-->
257+
TODO: security / cert stuff.
244258

245259
## Design Details
246260

247-
<!--
248-
This section should contain enough information that the specifics of your
249-
change are understandable. This may include API specs (though not always
250-
required) or even code snippets. If there's any ambiguity about HOW your
251-
proposal will be implemented, this is the place to discuss them.
252-
-->
261+
TODO: specific API change (x2)
262+
263+
TODO: explanation of how the handler will determine a request is for a resource
264+
that should be proxied.
265+
266+
TODO: explanation of how the security handshake between apiservers works.
267+
* What we need to fix: random processes / external users / etc should not be
268+
able to proxy requests, so the receiving apiserver needs to be able to verify
269+
the source apiserver.
270+
* generate self-signed cert on startup, put pubkey in apiserver identity lease
271+
object?
272+
253273

254274
### Test Plan
255275

0 commit comments

Comments
 (0)