-
Notifications
You must be signed in to change notification settings - Fork 581
Extra minimal OCG API #3952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Extra minimal OCG API #3952
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,360 @@ | ||
# GEP-3951: Minimal Out-of-Cluster Gateway API | ||
|
||
* Issue: [#3951](https://github.com/kubernetes-sigs/gateway-api/issues/3951) | ||
* Status: Provisional | ||
|
||
See [status definitions](../overview.md#gep-states). | ||
|
||
[Chihiro]: https://gateway-api.sigs.k8s.io/concepts/roles-and-personas/#chihiro | ||
[Ian]: https://gateway-api.sigs.k8s.io/concepts/roles-and-personas/#ian | ||
[Ana]: https://gateway-api.sigs.k8s.io/concepts/roles-and-personas/#ana | ||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", | ||
"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this | ||
document are to be interpreted as described in BCP 14 ([RFC8174]) when, and | ||
only when, they appear in all capitals, as shown here. | ||
|
||
[RFC8174]: https://www.rfc-editor.org/rfc/rfc8174 | ||
|
||
## User Story | ||
|
||
[GEP-3792] defines the rationale | ||
for allowing out-of-cluster Gateways (OCGs) | ||
to participate in a | ||
GAMMA-compliant in-cluster service mesh, | ||
and the problems that must be solved | ||
to allow them to do so. | ||
This GEP defines | ||
an extremely minimal API | ||
to permit experimentation | ||
with OCGs and | ||
in-cluster mTLS meshes. | ||
|
||
Nomenclature, | ||
background, | ||
goals, | ||
non-goals, | ||
problems that must be solved, | ||
and some discussion | ||
of possible solutions to those problems | ||
are all included | ||
in [GEP-3792]. | ||
|
||
[GEP-3792]: https://gateway-api.sigs.k8s.io/geps/gep-3792/ | ||
|
||
## Goals of the Extra-Minimal API | ||
|
||
- Allow [Chihiro] and [Ian] | ||
to configurate and operate | ||
an OCG and | ||
an in-cluster mTLS mesh | ||
that know how to work together | ||
to experiment with OCG support | ||
in Gateway API. | ||
|
||
## Non-Goals | ||
|
||
- Support production use of OCGs | ||
in Gateway API. | ||
|
||
- Solve all of the problems | ||
defined in [GEP-3792]. | ||
|
||
This is an **extra-minimal** API. | ||
Its purpose is to allow **experimentation** | ||
with OCGs and in-cluster meshes, | ||
**not** to provide | ||
a production-ready solution. | ||
|
||
Using this API in production | ||
is **guaranteed** to result | ||
in anguish, heartbreak, tears, and pain. | ||
|
||
## Overview | ||
|
||
The extra-minimal OCG API | ||
solves two of the [GEP-3792] problems | ||
in a very minimal way. | ||
|
||
- It solves the | ||
[trust problem] | ||
by extending | ||
the Mesh and Gateway resources | ||
to permit specifying | ||
a _trust bundle_ | ||
that contains the CA certificates | ||
that the OCG and the mesh | ||
will use to trust each other. | ||
|
||
- It solves the | ||
[discovery problem] | ||
by adding a label selector | ||
to the Gateway resource | ||
that indicates which Routes | ||
are meshed. | ||
|
||
- It does not solve the | ||
[protocol problem] | ||
or the | ||
[outbound behavior problem]. | ||
|
||
[trust problem]: https://gateway-api.sigs.k8s.io/geps/gep-3792/#1-the-trust-problem | ||
[protocol problem]: https://gateway-api.sigs.k8s.io/geps/gep-3792/#2-the-protocol-problem | ||
[discovery problem]: https://gateway-api.sigs.k8s.io/geps/gep-3792/#3-the-discovery-problem | ||
[outbound behavior problem]: https://gateway-api.sigs.k8s.io/geps/gep-3792/#4-the-outbound-behavior-problem | ||
|
||
### Additions to the Mesh Resource | ||
|
||
The Mesh resource | ||
gains an `ocg` stanza | ||
containing a `trustBundle` field | ||
that refers to a ConfigMap | ||
that contains the CA certificate(s) | ||
that the mesh should trust | ||
when validating connections | ||
from the OCG: | ||
|
||
```yaml | ||
... | ||
spec: | ||
... | ||
ocg: | ||
trustBundle: | ||
name: ocg-trust-bundle | ||
namespace: ocg-namespace | ||
# Key in Configmap; defaults to "ca-bundle.crt" | ||
bundleKey: ca-bundle.crt | ||
``` | ||
|
||
#### Additions to the Gateway Resource | ||
|
||
The Gateway resource | ||
gains a `mesh` stanza | ||
containing two fields: | ||
|
||
- a `trustBundle` field | ||
that refers to a ConfigMap | ||
that contains the CA certificate(s) | ||
that the OCG should trust | ||
when validating connections | ||
from meshed peers | ||
|
||
- a `labelSelector` field | ||
that indicates which Routes | ||
are meshed. | ||
Comment on lines
+142
to
+144
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we had similar comment thread on the original PR. #3894 (comment) I am unclear why we need to select routes, namespace seems like it would cover 90% of the cases, and we can opt-in OR opt-out services. |
||
|
||
```yaml | ||
... | ||
spec: | ||
... | ||
mesh: | ||
trustBundle: | ||
name: mesh-trust-bundle | ||
namespace: mesh-namespace | ||
# Key in Configmap; defaults to "ca-bundle.crt" | ||
bundleKey: ca-bundle.crt | ||
Comment on lines
+151
to
+155
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i personally think clusterTrustBundle is a good fit for this. can we mention that we intend to support clusterTrustBundle in the future? or do u see some fundamental problem with it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1, I'd much rather start with ClusterTrustBundle as the recommendation where available, and ConfigMap as an optional backfill where it's not. I know that's not great now, but by the time this API is stable/GA, I'm guessing ClusterTrustBundle will be much more widely available. |
||
labelSelector: | ||
matchLabels: | ||
mesh: one-mesh-to-mesh-them-all | ||
Comment on lines
+157
to
+158
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to confirm - is this selecting the Mesh resource or the Routes or Namespaces? also, can u consider calling out some alternative mechanisms? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If lines 142-144 are still the state, it looks like this selects Routes. Left another comment on those lines. And also we had a different comment thread on the original PR. My strong preference is to go with namespace, and potentially provide opt-out for services. |
||
``` | ||
|
||
### Trust Bundles: Solving the Trust Problem | ||
|
||
The trust problem is that | ||
both the OCG and the mesh | ||
need to be able to do mTLS verification | ||
of connections arriving from the other. | ||
The simplest solution to this problem | ||
is to add a _trust bundle_ | ||
to the Gateway resource | ||
and to the Mesh resource. | ||
|
||
- The trust bundle | ||
in the Gateway resource | ||
will define the CA certificate(s) | ||
that the OCG | ||
should accept as trusted | ||
when validating connections | ||
from meshed peers. | ||
|
||
- The trust bundle | ||
in the Mesh resource | ||
will define the CA certificate(s) | ||
that the mesh | ||
should accept as trusted | ||
when validating connections | ||
from the OCG. | ||
Comment on lines
+172
to
+186
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we add a bit more info around why we chose to have GW trust bundle and mesh and the Mesh trust bundle in the GW resource? while i agree that this model to be simpler, but IMO it's worth mentioning the alternatives. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with the proposed model, is the mesh trust bundle duplicated every single Gateway resource? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The proposed model is that each Gateway gets its own trust bundle. We may want to consider having a default in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I agree about alternatives -- I had it in my head that in many cases I should move them into GEP-3792 itself, but thinking about it while the sun is up, that seems silly. 🙂 Will update. |
||
|
||
This is a straightforward way | ||
to permit each component | ||
to verify the identity of the other, | ||
which will provide | ||
sufficient basis for verifying identity when | ||
mTLS meshes are involved. | ||
|
||
#### The `trustBundle` Stanza | ||
|
||
The Mesh and Gateway resources | ||
both use a common `trustBundle` stanza: | ||
|
||
```yaml | ||
trustBundle: | ||
name: configmap-name | ||
namespace: configmap-namespace | ||
# Key in Configmap; defaults to "ca-bundle.crt" | ||
bundleKey: ca-bundle.crt | ||
``` | ||
|
||
The `name` field is always required. | ||
The `namespace` field is required | ||
in the Mesh resource, | ||
but may be omitted | ||
in the Gateway resource | ||
if the ConfigMap is | ||
in the same namespace | ||
as the Gateway resource. | ||
|
||
The ConfigMap referred to | ||
by the `trustBundle` stanza | ||
MUST contain | ||
a PEM-encoded trust bundle | ||
in the specified `bundleKey`, | ||
for example: | ||
|
||
```yaml | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: ocg-trust-bundle | ||
namespace: ocg-namespace | ||
data: | ||
ca-bundle.crt: |- | ||
-----BEGIN CERTIFICATE----- | ||
... (PEM-encoded CA certificate) ... | ||
-----END CERTIFICATE----- | ||
... (may be repeated for multiple CA certificates) ... | ||
``` | ||
|
||
The `trustBundle` in either | ||
the Mesh resource | ||
or the Gateway resource | ||
may refer to a ConfigMap | ||
in any namespace | ||
to which RBAC permits access. | ||
|
||
##### Further Considerations | ||
|
||
The OCG and the mesh | ||
MAY share the same trust bundle, | ||
but this is not required. | ||
If they do, | ||
the Gateway and Mesh resources | ||
MAY refer to the same ConfigMap; | ||
if they do not, | ||
they must (of course) | ||
refer to different ConfigMaps | ||
that contain the same CA certificate(s). | ||
|
||
The `trustBundle` fields | ||
MAY NOT refer to Secrets. | ||
Since CA certificates are not private, | ||
they should not be stored in Secrets. | ||
|
||
An alternative to adding | ||
the `trustBundle` stanza | ||
to both the Mesh and Gateway resources | ||
would be to define a single trust bundle, | ||
requiring the OCG and the mesh | ||
to each use the same CA certificate. | ||
This adds considerable operational complexity - | ||
especially in the world of enterprise PKI - | ||
without any real benefit. | ||
|
||
### Label Selectors: Solving the Discovery Problem | ||
|
||
The discovery problem is that | ||
not every workload in the cluster | ||
is required to be meshed, | ||
and the OCG needs a way | ||
to know which Routes are meshed | ||
since it must ensure that it | ||
correctly uses mTLS | ||
for connections to meshed workloads. | ||
|
||
In practice, this isn't | ||
actually a question of _workloads_ | ||
but of _Routes_: | ||
the point of interface | ||
between a Gateway | ||
and a workload in the cluster | ||
is not a Pod or a Service, but | ||
rather a Route. | ||
|
||
The extra-minimal API | ||
solves this problem | ||
by adding a label selector | ||
to the Gateway resource | ||
that indicates which Routes | ||
are meshed. | ||
When the OCG connects | ||
to any Route | ||
that either directly matches this selector, | ||
or is in a namespace that matches this selector, | ||
it MUST use mTLS | ||
with a certificate | ||
that is ultimately signed | ||
by a CA certificate | ||
in the Mesh resource's `trustBundle`, | ||
and the OCG MUST validate | ||
that the peer presents a certificate | ||
that is ultimately signed | ||
by a CA certificate | ||
in the Gateway resource's `trustBundle`. | ||
Comment on lines
+293
to
+312
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. couple of open questions (copied from my #3941) :
This GEP defines how the OCG and the mesh should be configured to trust each Currently, this is left as an implementation detail, likely handled via a
This GEP focuses on meshes where mTLS is strictly enforced for |
||
|
||
The label selector | ||
is a simple mechanism | ||
(especially if | ||
the mesh already uses a label | ||
to indicate which resources are meshed) | ||
but it is still an active choice, | ||
rather than assuming | ||
things about the whole cluster. | ||
Additionally, it permits | ||
operating at the namespace level | ||
or at the Route level. | ||
|
||
### Other Problems | ||
|
||
The extra-minimal API | ||
does not solve the | ||
[protocol problem] | ||
or the | ||
[outbound behavior problem]. | ||
Instead, it assumes that | ||
the OCG and the mesh | ||
have prearranged | ||
protocols and behaviors | ||
that are mutually compatible. | ||
|
||
## Graduation Criteria | ||
|
||
Since [GEP-3792] mandates | ||
that any OCG API | ||
MUST solve all four problems | ||
defined in [GEP-3792] | ||
before graduating to standard, | ||
this GEP | ||
MUST NOT graduate to standard | ||
without significant further work. | ||
|
||
## Conformance Details | ||
|
||
#### Feature Names | ||
|
||
This GEP will use the feature names | ||
`GatewayExtraMinimalOCG` and | ||
`MeshExtraMinimalOCG`. | ||
|
||
### Conformance tests | ||
|
||
TBA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd formatting? is this a new formatter or something?