Skip to content

gep: standardizing behavior for invalid BackendTLSPolicy #3909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 13, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions geps/gep-1897/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,13 @@ configuration. CACertificateRefs is an implementation-specific slice of
named object references, each containing a single cert. We originally proposed to follow the convention established by the
[CertificateRefs field on Gateway](https://github.com/kubernetes-sigs/gateway-api/blob/18e79909f7310aafc625ba7c862dfcc67b385250/apis/v1beta1/gateway_types.go#L340)
, but the CertificateRef requires both a tls.key and tls.crt and a certificate reference only requires the tls.crt.
If any of the CACertificateRefs cannot be resolved (e.g., the referenced resource does not exist) or is misconfigured (e.g., ConfigMap does not contain a key named `ca.crt`), the `ResolvedRefs` status condition MUST be set to `False` with `Reason: InvalidCACertificateRef`. Connections using that CACertificateRef MUST fail, and the client MUST receive an HTTP 5xx error response.
References to objects with an unsupported Group and Kind are not valid, and MUST be rejected by the implementation with the `ResolvedRefs` status condition set to `False` and `Reason: UnsupportedFeature`.
Implementations MAY perform further validation of the certificate content (i.e., checking expiry or enforcing specific formats). If they do, they MUST ensure that the `ResolvedRefs` Condition is `False` and use an implementation-specific `Reason`, like `ExpiredCertificate` or similar.
If `ResolvedRefs` Condition is `False` implementations SHOULD include a message specifying which references are invalid and explaining why.

If all CertificateRefs cannot be resolved, the BackendTLSPolicy is considered invalid and the implementation MUST set the `Accepted` Condition to `False`, with a reason of `NoCACertificates` and a message explaining this.

WellKnownCACertificates is an optional enum that allows users to specify whether to use the set of CA certificates trusted by the
Gateway (WellKnownCACertificates specified as "System"), or to use the existing CACertificateRefs (WellKnownCACertificates
specified as ""). The use and definition of system certificates is implementation-dependent, and the intent is that
Expand All @@ -222,8 +229,12 @@ references to Kubernetes objects that contain PEM-encoded TLS certificates, whic
between the gateway and backend pod. References to a resource in a different namespace are invalid.
If ClientCertificateRefs is unspecified, then WellKnownCACertificates must be set to "System" for a valid configuration.
If WellKnownCACertificates is unspecified, then CACertificateRefs must be specified with at least one entry for a valid configuration.
If WellKnownCACertificates is set to "System" and there are no system trusted certificates or the implementation doesn't define system
trusted certificates, then the associated TLS connection must fail.
If an implementation does not support the WellKnownCACertificates, or the provided value is unsupported,the BackendTLSPolicy is considered invalid, and the implementation MUST set the `Accepted` Condition to `False`, with a reason of `UnsupportedFeature` and a message explaining this.

For an invalid BackendTLSPolicy, implementations MUST NOT fall back to unencrypted (plaintext) connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "Invalid" isn't the right term here because BTP spec is validated by CEL, making an invalid BTP impossible.

If this refers to a hostname or certificate mismatch with backend configuration, please state that explicitly.
Besides, this case is already covered with implementation specific way.
.

On the question of how to signal that there was a failure in the certificate validation, this is left up to the implementation to return a response error that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction, based on established security requirements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed off-channel, I had two cases in mind where I would consider a BackendTLSPolicy to be invalid, apart from the standard CEL validation or runtime errors:

If the CertificateRef cannot be resolved or does not include a certificate (tls.crt), the BackendTLSPolicy is considered invalid.

If WellKnownCACertificates is set to "System" and there are no system trusted certificates or the implementation doesn't define system
trusted certificates, the BackendTLSPolicy is considered invalid.

My main concern for raising this is to understand, how would the BackendTLSPolicy signal that the referenced resource in CertificateRef does not exist or cannot be resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can created new conditions similar to "ResolvedRefs" condition which is dedicated for listeners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly what I've suggested in #3909 (comment)

An example structure could look like this:

status:
  ancestors:
  - ancestorRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: gw
    conditions:
    - type: Accepted
      reason: Accepted
      status: "True"
      message: BackendTLSPolicy is accepted
    - type: ResolvedRefs
      reason: InvalidCertificateRef | UnsuppportedWellKnownCACertificates
      status: "False"
      message: (implementation specific error)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that anything that references other objects should include a ResolvedRefs condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kl52752 has a point about the pre-existing text regarding failures to connect. I prefer this over an HTTP 5XX error response. What do you think @robscott @youngnick ?

On the question of how to signal that there was a failure in the certificate validation, this is left up to the implementation to return a response error that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction, based on established security requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To go back to the original point, in this document we're defining an "invalid" BackendTrafficPolicy to be one that either uses an unsupported feature (WellKnownCACertificates) or has zero valid CertificateRefs, and that's a bit different to "has no syntactic errors", which CEL prevents.

Instead, the corresponding TLS connection MUST fail, and the client MUST receive an HTTP 5xx error response.

Implementations MUST NOT modify any status other than their own. Ownership of a status is determined by the `controllerName`, which identifies the responsible controller.

The `Hostname` field is required and is to be used to configure the SNI the Gateway should use to connect to the backend.
Implementations must validate that at least one name in the certificate served by the backend matches this field.
Expand Down