Skip to content

Commit ec889db

Browse files
committed
Add design details for new filter in handler chain
1 parent 91de7a5 commit ec889db

File tree

2 files changed

+76
-35
lines changed

2 files changed

+76
-35
lines changed

keps/sig-api-machinery/3903-unknown-version-interoperability-proxy/README.md

Lines changed: 61 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -78,17 +78,25 @@ tags, and then generate with `hack/update-toc.sh`.
7878
- [Non-Goals](#non-goals)
7979
- [Proposal](#proposal)
8080
- [User Stories (Optional)](#user-stories-optional)
81-
- [Story 1](#story-1)
82-
- [Story 2](#story-2)
81+
- [Garbage Collector](#garbage-collector)
82+
- [Namespace Lifecycle Controller](#namespace-lifecycle-controller)
8383
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
8484
- [Risks and Mitigations](#risks-and-mitigations)
8585
- [Design Details](#design-details)
86+
- [Aggregation Layer](#aggregation-layer)
87+
- [StorageVersion enhancement needed](#storageversion-enhancement-needed)
88+
- [Identifying destination apiserver's network location](#identifying-destination-apiservers-network-location)
89+
- [Proxy transport between apiservers and authn](#proxy-transport-between-apiservers-and-authn)
90+
- [Discovery Merging](#discovery-merging)
8691
- [Test Plan](#test-plan)
8792
- [Prerequisite testing updates](#prerequisite-testing-updates)
8893
- [Unit tests](#unit-tests)
8994
- [Integration tests](#integration-tests)
9095
- [e2e tests](#e2e-tests)
9196
- [Graduation Criteria](#graduation-criteria)
97+
- [Alpha](#alpha)
98+
- [Beta](#beta)
99+
- [GA](#ga)
92100
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
93101
- [Version Skew Strategy](#version-skew-strategy)
94102
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -183,7 +191,6 @@ incorrectly or objects being garbage collected mistakenly.
183191

184192
### Non-Goals
185193

186-
* Change cluster installation procedures (no new certs etc)
187194
* Lock particular clients to particular versions
188195

189196
## Proposal
@@ -265,7 +272,7 @@ This might be a good place to talk about core concepts and how they relate.
265272

266273
Cluster admins might not read the release notes and realize they should enable
267274
network/firewall connectivity between apiservers. In this case clients will
268-
recieve 503s instead of transparently being proxied. 503 is still safer than
275+
receive 503s instead of transparently being proxied. 503 is still safer than
269276
today's behavior.
270277

271278
Requests will consume egress bandwidth for 2 apiservers when proxied. We can cap
@@ -275,20 +282,53 @@ with a metric.
275282

276283
There could be a large volume of requests for a specific resource which might result in the identified apiserver being unable to serve the proxied requests. This scenario should not occur too frequently, since resource types which have large request volume should not be added or removed during an upgrade -- that would cause other problems, too.
277284

278-
TODO: security / cert stuff.
285+
We should ensure at most one proxy, rather than proxying the request over and over again (if the source apiserver has an incorrect understanding of what the destination apiserver can serve).
286+
287+
To prevent server-side request forgeries we will not give control over information about apiserver IP/endpoint and the trust bundle (used to authenticate server while proxying) to users via REST APIs.
279288

280289
## Design Details
281290

282-
TODO: explanation of how the handler will determine a request is for a resource
283-
that should be proxied.
291+
### Aggregation Layer
292+
293+
1. A new filter will be added to the [handler chain] of the aggregation layer. This filter will maintain an internal map with the key being the group-version-resource and the value being a list of server IDs of apiservers that are capable of serving that group-version-resource
294+
1. This internal map is populated using an informer for StorageVersion objects. An event handler will be added for this informer that will get the apiserver ID of the requested group-version-resource and update the internal map accordingly
295+
296+
2. This filter will pass on the request to the next handler in the local aggregator chain, if:
297+
1. It is a non resource request
298+
2. The StorageVersion informer cache hasn't synced yet or if `StorageVersionManager.Completed()` has returned false. We will serve error 503 in this case
299+
3. The request has a header that indicates that this request has been proxied once already. If for some reason the resource is not found locally, we will serve error 503
300+
4. No StorageVersion was retrieved for it, meaning the request is for an aggregated API or for a custom resource
301+
5. If the local apiserver ID is found in the list of serviceable-by server IDs from the internal map
302+
303+
3. If the local apiserver ID is not found in the list of serviceable-by server IDs, a random apiserver ID will be selected from the retrieved list and the request will be proxied to this apiserver
304+
305+
4. If there is no apiserver ID retrieved for the requested GVR, we will serve 404 with error `GVR <group_version_resource> is not served by anything in this cluster`
306+
307+
5. If the proxy call fails for network issues or any reason, we serve 503 with error `Error while proxying request to destination apiserver`
308+
309+
[handler chain]:https://github.com/kubernetes/kubernetes/blob/fc8f5a64106c30c50ee2bbcd1d35e6cd05f63b00/staging/src/k8s.io/apiserver/pkg/server/config.go#L639
310+
311+
#### StorageVersion enhancement needed
312+
313+
StorageVersion API currently tells us whether a particular StorageVersion can be read from etcd by the listed apiserver. We will enhance this API to also include apiserver ID of the server that can serve this StoageVersion.
314+
315+
#### Identifying destination apiserver's network location
316+
317+
* TODO: We need to find a place to store and retrieve the destination apiserver's host and port information given the server's ID.
318+
We do not want to store this information in
284319

285-
TODO: explanation of how the security handshake between apiservers works.
286-
* What we need to fix: random processes / external users / etc should not be
287-
able to proxy requests, so the receiving apiserver needs to be able to verify
288-
the source apiserver.
289-
* generate self-signed cert on startup, put pubkey in apiserver identity lease
290-
object?
320+
* StorageVersion : because we do not want to expose the network identity of the apiservers in this API that can be listed in multiple places where it may be unnecessary/redundant to do so
321+
* Endpoint reconciler lease : because the IP present here could be that of a load balancer for the apiservers, but we need to know the definite address of the identified destination apiserver
291322

323+
#### Proxy transport between apiservers and authn
324+
325+
For the mTLS between source and destination apiservers, we will do the following
326+
327+
1. For server authentication by the client (source apiserver) : the client needs to validate the server certs (presented by the destination apiserver), for which it needs to know the CA bundle of the authority that signed those certs. We will introduce a new flag --peer-ca-file that must be passed to the kube-apiserver to verify the other kube-apiserver's server certs
328+
329+
2. For client authentication by the server (destination apiserver) : destination apiserver will check the source apiserver certs to determine that the proxy request is from an authenticated client. The destination apiserver will use requestheader authentication (and NOT client cert authentication) for this using the kube-aggregator proxy client cert/key and the --requestheader-client-ca-file passed to the apiserver upon bootstrap
330+
331+
### Discovery Merging
292332
TODO: detailed description of discovery merging. (not scheduled until beta.)
293333

294334
### Test Plan
@@ -369,11 +409,11 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
369409
#### Alpha
370410

371411
- Proxying implemented (behind feature flag)
412+
- mTLS or other secure system used for proxying
372413

373414
#### Beta
374415

375416
- Discovery document merging implemented
376-
- mTLS or other secure system used for proxying
377417

378418
#### GA
379419

@@ -651,18 +691,17 @@ These goals will help you determine what you need to measure (SLIs) in the next
651691
question.
652692
-->
653693

694+
This feature depends on the `StorageVersion` feature, that generates objects with a `storageVersion.status.serverStorageVersions[*].apiServerID` field which is used to find the destination apiserver's network location.
695+
654696
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
655697

656698
<!--
657699
Pick one more of these and delete the rest.
658700
-->
659701

660-
- [ ] Metrics
661-
- Metric name:
662-
- [Optional] Aggregation method:
663-
- Components exposing the metric:
664-
- [ ] Other (treat as last resort)
665-
- Details:
702+
- [X] Metrics
703+
- Metric name: `kubernetes_uvip_count`
704+
- Components exposing the metric: kube-apiserver
666705

667706
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
668707

@@ -679,6 +718,8 @@ This section must be completed when targeting beta to a release.
679718

680719
###### Does this feature depend on any specific services running in the cluster?
681720

721+
No, but it does depend on the `StorageVersion` feature in kube-apiserver.
722+
682723
<!--
683724
Think about both cluster-level services (e.g. metrics-server) as well
684725
as node-level agents (e.g. specific version of CRI). Focus on external or
Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
title: Unknown Version Interoperability Proxy
2-
kep-number: 3903
2+
kep-number: 4020
33
authors:
44
- "@lavalamp"
5+
- "@hankang"
6+
- "@richabanker"
57
owning-sig: sig-api-machinery
6-
participating-sigs:
7-
- sig-aaa
8-
- sig-bbb
9-
status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced
10-
creation-date: yyyy-mm-dd
8+
status: provisional
9+
creation-date: 2023-05-17
1110
reviewers:
12-
- TBD
13-
- "@alice.doe"
11+
- "@deads2k"
12+
- "@liggitt"
13+
- "@jpbetz"
14+
1415
approvers:
15-
- TBD
16-
- "@oscar.doe"
16+
- "@deads2k"
17+
- "@jpbetz"
1718

1819
see-also:
19-
- apiserver identity kep, link TODO
20+
- "/keps/sig-api-machinery/2339-storageversion-api-for-ha-api-servers"
2021
replaces:
21-
- none I think
22+
- none
2223

2324
# The target maturity stage in the current dev cycle for this KEP.
2425
stage: alpha
@@ -37,12 +38,11 @@ milestone:
3738
# The following PRR answers are required at alpha release
3839
# List the feature gate name and the components for which it must be enabled
3940
feature-gates:
40-
- name: MyFeature
41+
- name: UnknownVersionInteroperabilityProxy
4142
components:
4243
- kube-apiserver
43-
- kube-controller-manager
4444
disable-supported: true
4545

4646
# The following PRR answers are required at beta release
4747
metrics:
48-
- my_feature_metric
48+
- TODO

0 commit comments

Comments
 (0)