Skip to content

Extra minimal OCG API #3952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

kflynn
Copy link
Contributor

@kflynn kflynn commented Jul 23, 2025

This is a GEP for an extra minimal OCG API, intended not to be production-ready but to permit experimentation.

/kind gep

Fixes #3951

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 23, 2025
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2025
Comment on lines +172 to +186
- The trust bundle
in the Gateway resource
will define the CA certificate(s)
that the OCG
should accept as trusted
when validating connections
from meshed peers.

- The trust bundle
in the Mesh resource
will define the CA certificate(s)
that the mesh
should accept as trusted
when validating connections
from the OCG.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a bit more info around why we chose to have GW trust bundle and mesh and the Mesh trust bundle in the GW resource?

while i agree that this model to be simpler, but IMO it's worth mentioning the alternatives.
i added some info around this in https://github.com/kubernetes-sigs/gateway-api/pull/3941/files#diff-4a7d8011b2ad7222ce2d13ee98f49443d6eb56518625438daa62c10e94d9f772R279-R489
specifically proposals 1 and 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the proposed model, is the mesh trust bundle duplicated every single Gateway resource?
(is there a common gateway config somewhere?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed model is that each Gateway gets its own trust bundle. We may want to consider having a default in the GatewayClass, but this is the extra minimal API so it's not there yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I agree about alternatives -- I had it in my head that in many cases I should move them into GEP-3792 itself, but thinking about it while the sun is up, that seems silly. 🙂 Will update.

Comment on lines +151 to +155
trustBundle:
name: mesh-trust-bundle
namespace: mesh-namespace
# Key in Configmap; defaults to "ca-bundle.crt"
bundleKey: ca-bundle.crt

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i personally think clusterTrustBundle is a good fit for this. can we mention that we intend to support clusterTrustBundle in the future? or do u see some fundamental problem with it?

Copy link
Member

@robscott robscott Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'd much rather start with ClusterTrustBundle as the recommendation where available, and ConfigMap as an optional backfill where it's not. I know that's not great now, but by the time this API is stable/GA, I'm guessing ClusterTrustBundle will be much more widely available.

Comment on lines +157 to +158
matchLabels:
mesh: one-mesh-to-mesh-them-all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm - is this selecting the Mesh resource or the Routes or Namespaces?
i think it's the latter but the one-mesh-to-mesh-them-all is confusing me because u have that set on the Mesh resource in the Mesh GEP.

also, can u consider calling out some alternative mechanisms?
https://github.com/kubernetes-sigs/gateway-api/pull/3941/files#diff-4a7d8011b2ad7222ce2d13ee98f49443d6eb56518625438daa62c10e94d9f772R638-R730.
doesn't have to be this but i think it's useful to document the alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lines 142-144 are still the state, it looks like this selects Routes. Left another comment on those lines. And also we had a different comment thread on the original PR.

My strong preference is to go with namespace, and potentially provide opt-out for services.

Comment on lines +293 to +312
The extra-minimal API
solves this problem
by adding a label selector
to the Gateway resource
that indicates which Routes
are meshed.
When the OCG connects
to any Route
that either directly matches this selector,
or is in a namespace that matches this selector,
it MUST use mTLS
with a certificate
that is ultimately signed
by a CA certificate
in the Mesh resource's `trustBundle`,
and the OCG MUST validate
that the peer presents a certificate
that is ultimately signed
by a CA certificate
in the Gateway resource's `trustBundle`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of open questions (copied from my #3941) :

  1. How to configure the Gateway's identity certificate and private key?

This GEP defines how the OCG and the mesh should be configured to trust each
other by exchanging CA bundles. However, it does not standardize how an
administrator configures the specific client certificate and private key that
the OCG uses to identify itself to the mesh.

Currently, this is left as an implementation detail, likely handled via a
provider-specific CRD referenced from the GatewayClass or through an out-of-band
mechanism. The open question is: Should a future version of this GEP standardize
this configuration to ensure a consistent user experience? This could involve
adding a new identityCertificateRef field to the Gateway spec.

  1. Should use cases where mesh workloads disable mTLS be supported?

This GEP focuses on meshes where mTLS is strictly enforced for
communication. However, some service meshes support a "DISABLE" mode where mTLS
can be disabled for certain workloads. This raises the question: How should an
OCG behave when a target workload is discovered as "meshed" but does not require
or accept an mTLS connection?

@kflynn kflynn mentioned this pull request Jul 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kflynn
Once this PR has been reviewed and has the lgtm label, please assign danwinship for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2025
kflynn added 3 commits July 26, 2025 09:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 26, 2025
@k8s-ci-robot
Copy link
Contributor

@kflynn: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-verify d17ffa6 link true /test pull-gateway-api-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +21 to +31
[GEP-3792] defines the rationale
for allowing out-of-cluster Gateways (OCGs)
to participate in a
GAMMA-compliant in-cluster service mesh,
and the problems that must be solved
to allow them to do so.
This GEP defines
an extremely minimal API
to permit experimentation
with OCGs and
in-cluster mTLS meshes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd formatting? is this a new formatter or something?

Comment on lines +142 to +144
- a `labelSelector` field
that indicates which Routes
are meshed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had similar comment thread on the original PR. #3894 (comment)

I am unclear why we need to select routes, namespace seems like it would cover 90% of the cases, and we can opt-in OR opt-out services.

Comment on lines +157 to +158
matchLabels:
mesh: one-mesh-to-mesh-them-all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lines 142-144 are still the state, it looks like this selects Routes. Left another comment on those lines. And also we had a different comment thread on the original PR.

My strong preference is to go with namespace, and potentially provide opt-out for services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/gep PRs related to Gateway Enhancement Proposal(GEP) release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extra-minimal OCG API
5 participants