Skip to content

gep: standardizing behavior for invalid BackendTLSPolicy #3909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 13, 2025
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions geps/gep-1897/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,8 @@ configuration. CACertificateRefs is an implementation-specific slice of
named object references, each containing a single cert. We originally proposed to follow the convention established by the
[CertificateRefs field on Gateway](https://github.com/kubernetes-sigs/gateway-api/blob/18e79909f7310aafc625ba7c862dfcc67b385250/apis/v1beta1/gateway_types.go#L340)
, but the CertificateRef requires both a tls.key and tls.crt and a certificate reference only requires the tls.crt.
If any of the CACertificateRefs cannot be resolved or is misconfigured, the BackendTLSPolicy is considered invalid.

WellKnownCACertificates is an optional enum that allows users to specify whether to use the set of CA certificates trusted by the
Gateway (WellKnownCACertificates specified as "System"), or to use the existing CACertificateRefs (WellKnownCACertificates
specified as ""). The use and definition of system certificates is implementation-dependent, and the intent is that
Expand All @@ -222,8 +224,11 @@ references to Kubernetes objects that contain PEM-encoded TLS certificates, whic
between the gateway and backend pod. References to a resource in a different namespace are invalid.
If ClientCertificateRefs is unspecified, then WellKnownCACertificates must be set to "System" for a valid configuration.
If WellKnownCACertificates is unspecified, then CACertificateRefs must be specified with at least one entry for a valid configuration.
If WellKnownCACertificates is set to "System" and there are no system trusted certificates or the implementation doesn't define system
trusted certificates, then the associated TLS connection must fail.
If an implementation does not support the WellKnownCACertificates, or the provided value is unsupported,the BackendTLSPolicy is considered invalid.

For an invalid BackendTLSPolicy, implementations MUST NOT fall back to unencrypted (plaintext) connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "Invalid" isn't the right term here because BTP spec is validated by CEL, making an invalid BTP impossible.

If this refers to a hostname or certificate mismatch with backend configuration, please state that explicitly.
Besides, this case is already covered with implementation specific way.
.

On the question of how to signal that there was a failure in the certificate validation, this is left up to the implementation to return a response error that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction, based on established security requirements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed off-channel, I had two cases in mind where I would consider a BackendTLSPolicy to be invalid, apart from the standard CEL validation or runtime errors:

If the CertificateRef cannot be resolved or does not include a certificate (tls.crt), the BackendTLSPolicy is considered invalid.

If WellKnownCACertificates is set to "System" and there are no system trusted certificates or the implementation doesn't define system
trusted certificates, the BackendTLSPolicy is considered invalid.

My main concern for raising this is to understand, how would the BackendTLSPolicy signal that the referenced resource in CertificateRef does not exist or cannot be resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can created new conditions similar to "ResolvedRefs" condition which is dedicated for listeners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly what I've suggested in #3909 (comment)

An example structure could look like this:

status:
  ancestors:
  - ancestorRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: gw
    conditions:
    - type: Accepted
      reason: Accepted
      status: "True"
      message: BackendTLSPolicy is accepted
    - type: ResolvedRefs
      reason: InvalidCertificateRef | UnsuppportedWellKnownCACertificates
      status: "False"
      message: (implementation specific error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that anything that references other objects should include a ResolvedRefs condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kl52752 has a point about the pre-existing text regarding failures to connect. I prefer this over an HTTP 5XX error response. What do you think @robscott @youngnick ?

On the question of how to signal that there was a failure in the certificate validation, this is left up to the implementation to return a response error that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction, based on established security requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To go back to the original point, in this document we're defining an "invalid" BackendTrafficPolicy to be one that either uses an unsupported feature (WellKnownCACertificates) or has zero valid CertificateRefs, and that's a bit different to "has no syntactic errors", which CEL prevents.

Instead, the corresponding TLS connection MUST fail, and the client MUST receive an HTTP error response.
Additionally, the `Accepted` status condition of the BackendTLSPolicy MUST be set to `False` with the reason `Invalid`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it again, it might make sense to introduce the ResolvedRefs condition for policies as well. However, I’m not sure whether it fits within the current schedule for graduating the BackendTLSPolicy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be difficult to implement due to the distinct processes of policy application and certificate validation against the backend. It's a valid scenario that the BTP was accepted, and configuration properly propagated by the controller, but connectivity is broken due to certificate misconfiguration. All information needed to debug this issue should be passed by inspecting the BTP spec against the Service to which the Policy was applied.

Copy link
Member

@robscott robscott Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @kl52752 that ResolvedRefs would be difficult to set for anything that was tied to dataplane/connectivity. On the other hand, ResolvedRefs seems like a useful concept on BackendTLSPolicy for the case that a CACertRef is invalid/points to something that doesn't exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there can be a maximum of 8 CACertificateRefs on the BTP validation type, so we would also want to specify here whether a minimum of one wrong CACertificateRef causes Accepted to become false or ResolvedRefs to become false, or for either to reflect some degree of problems (1 of 4 CACertificateRefsInvalid?). Isn't this something we should rather leave as an implementation detail?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize: I think that ResolvedRefs should be included and used for certificates that do not exist or are not valid (where "not valid" means "does not contain certain keys", not "unwrap the certificate and check certificate properties").

Decoding the certificate should never be required for an implementation to be conformant. Implementations MAY unwrap if they wish and do additional error handling, but those errors should be in addition to the included error handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing Nick's thoughts on here, I guess it would be good to have the definition of what are the conditions that cause an invalid certificate explicit somewhere (eg.: ref does not exist, ref exist but does not contain the right keys, ref is not of secret type tls, etc). This would probably be good for conformance as well.

I still think that for some cases it would be good to say that controllers CAN validate the certificate content, as I can think on numerous cases where a secret with a tls.crt key that contains multiple certificates has some sort of bad formation (invalid PEM, etc) that can cause the whole gateway to go down if it tries to blindly use the certificate content.

Again, not a MUST but probably a recommendation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that adding some explicit examples of what makes a reference invalid is fine - but the second item - "ref exist but does not contain the right keys" requires decoding the Secret, so it can only be a MAY.


The `Hostname` field is required and is to be used to configure the SNI the Gateway should use to connect to the backend.
Implementations must validate that at least one name in the certificate served by the backend matches this field.
Expand Down