Skip to content

Jakew/entra cert expiry alerts#64504

Open
nixpig wants to merge 37 commits intomasterfrom
jakew/entra-cert-expiry-alerts
Open

Jakew/entra cert expiry alerts#64504
nixpig wants to merge 37 commits intomasterfrom
jakew/entra-cert-expiry-alerts

Conversation

@nixpig
Copy link
Contributor

@nixpig nixpig commented Mar 11, 2026

This change adds watch support for SAML connectors and a periodic SAML certificate expiry monitor. The monitor checks SAML connectors every 24 hours and on any connector create/update/delete for any with expiring signing certificates within the next 90 days. If any connectors have expiring/expired certificates then an alert is raised.

This addresses the "highest priority change" from #63604.

Single connector with expiring cert:
image

Multiple connectors with expiring certs:
image

Manual Test Plan

Test Environment

Local build of Teleport from branch jakew/entra-cert-expiry-alerts deployed to https://jwardtest01.cloud.gravitational.io/.

Test cert generation

Certificates are generated using the following command and updated against the connector via the web UI.

# Substituting the -days value per test requirement.
openssl req -x509 -newkey rsa:2048 -keyout /dev/null -days 1 -nodes -subj "/CN=test" -outform DER 2>/dev/null | base64 -w 0

Test Cases

  • Connector with cert that reaches expiry; monitor picks up and alert is created.
  • No connectors; no expiry alert is created.
  • Connector with valid cert; no expiry alert is created.
  • Connector with cert expiring within 90 days; alert is created.
  • Connector with cert expired in the past; alert is created.
  • Multiple connectors, one with cert expiring within 90 days; alert is created, only connector with expiring cert listed.
  • Multiple connectors, two with cert expiring within 90 days; alert is created, both connectors with expiring certs listed.
  • Update connector with expiring cert to have valid cert; alert previously created is removed.
  • Delete connector with expiring cert; alert previously created is removed.
  • Login as user with saml:read; alert is visible in UI.
  • Login as user without saml:read; alert is not visible in UI.

Changelog: Display an alert when SAML connector signing certificates are within 90 days of expiry.

@nixpig nixpig requested a review from smallinsky March 11, 2026 08:51
@github-actions github-actions bot requested review from Tener and aadc-dev March 11, 2026 08:51
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80100c4587

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c26a95e08

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52f105a66c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd622cc605

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@smallinsky smallinsky requested a review from kopiczko March 12, 2026 08:15
// buildAlertMessage returns a SAML cert expiry alert message for the given connector names.
func (m *SAMLCertExpiryMonitor) buildAlertMessage(connectorNames []string) string {
return fmt.Sprintf(
"The following connectors have one or more certificates that have expired or will expire in the next %d days: %s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we distinguish the Expired and ExpiredSoon connectors in the alert message ?

Additionally this message should be somehow addressable. Where after reading The following connectors have one or more certificates that have expired or will expire in the next %d the customer will thing so what ? it just a cert.

They will not know what is the impact and what to resolve the issue.

We will need to align this. a bit more to User Friendly message describing a impact and provided a link or steps to resolve this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something like this this more in line with what you're thinking?

image

The Learn More button currently linking to the SAML connector page in the docs, but I figure we should create a page specifically addressing connector cert expiry and how to resolve it.

I don't think the expired/expiring tacked on at the end is perfect, but from what I could see there isn't a way to have newline chars or lists in a cluster alert. Open to suggestions.

Copy link
Contributor

@smallinsky smallinsky Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The following" don't make to much sense because it is single line.

What about something like:

"SAML SSO users will no longer be able to authenticate to Teleport in X days" or in case of already expired "SAML SSO users are no longer able to authenticate to Teleport"

  • Connector "X" references an IdP certificate that expires in 3 days.
  • Connector "Y" references an IdP signing certificate that expired on 2026-03-10:14:00

Please rotate the IdP signing certificate. Refer to for more details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good, but as mentioned in previous comment, it doesn't appear newlines are currently supported. The CheckMessage method rejects any message with control characters:

for _, c := range c.Spec.Message {
if unicode.IsControl(c) {
return trace.BadParameter("control characters not supported in alerts")
}
}

I'm also not sure what we do when there are multiple connectors already expired and/or connectors expiring... e.g., if entra-connector-1 certs expire in 70 days, entra-connector-2 certts have already expired, and saml-connector-1 certs expire in 15 days, do we concat those messages like this:

"entra-connector-1 users will no longer be able to authenticate to Teleport in 70 days. entra-connector-2 users are no longer able to authenticate to Teleport. saml-connector-1 users will no longer be able to authenticate to Teleport in 15 days."

...which seems a bit confusing.

It's like we're trying to put too much and too granular information in an alert, which should just be a simple 'hey, you should be aware of this'.

Let's sync offline and come to agreement on how this should be presented.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7026c2b248

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d2e44ddc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// and creates or updates an alert. If none are expiring, then any existing alert is deleted.
func (m *SAMLCertExpiryMonitor) reconcileAlert(ctx context.Context) error {
var expiringConnectors []string
for connector, err := range m.Connectors.RangeSAMLConnectorsWithOptions(ctx, "", "", false, types.SAMLConnectorValidationFollowURLs(false)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid clearing alerts after partial SAML scans

reconcileAlert relies on RangeSAMLConnectorsWithOptions(..., SAMLConnectorValidationFollowURLs(false)) to do a full scan, but connector unmarshalling can still fail for valid configs with mfa.entity_descriptor_url because ValidateSAMLConnector unconditionally fetches MFA metadata (lib/services/saml.go) and RangeSAMLConnectorsWithOptions drops those errors instead of returning them (lib/services/local/users.go). In that case the expiring connector is omitted from expiringConnectors, so this path can delete saml-cert-expiry-warning as a false negative until a later successful scan.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11c45fa585

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb9d2d8e0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +190 to +191
if expiring, err := services.CheckSAMLCertExpiry(connector, samlCertExpiryTimeframe); err != nil {
return trace.Wrap(err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Continue reconciliation after per-connector cert errors

This returns immediately on the first CheckSAMLCertExpiry error, so one malformed connector certificate can block scanning of all remaining connectors and suppress alert updates cluster-wide. Since connector validation accepts metadata XML without decoding each embedded cert, this failure mode is reachable with persisted connector data; in that state the monitor repeatedly fails and never creates/clears alerts for other connectors.

Useful? React with 👍 / 👎.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both chatgpt comments above are reasonable. I think we should log and continue here.

Copy link
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM once the remaining comment is addressed.

I’d like to do a next pass after the dependency on the SAML connector event stream is removed from the context of this PR nit comment will be addressed.

Comment on lines +2941 to +2953
samlCertExpiryMonitor, err := auth.NewSAMLCertExpiryMonitor(auth.SAMLCertExpiryMonitorConfig{
Connectors: authServer.Services,
Alerts: authServer.Services,
Events: authServer.Services,
Clock: process.Clock,
Backend: process.backend,
Logger: logger.With(
teleport.ComponentKey, teleport.Component(teleport.ComponentAuth, "saml-cert-expiry-monitor"),
),
})
if err != nil {
return trace.Wrap(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets create a helper func for creation of samlCertExpirtyMonitor.

Comment on lines +190 to +191
if expiring, err := services.CheckSAMLCertExpiry(connector, samlCertExpiryTimeframe); err != nil {
return trace.Wrap(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both chatgpt comments above are reasonable. I think we should log and continue here.

<-done
})

require.EventuallyWithT(t, func(t *assert.CollectT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be rewritten to use synctest? Waiting for 1 second is almost guaranteed to be flaky in the CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test spins up an auth server and uses the backend watch path for events, which doesn't feel like a great fit for synctest. Precedent in lib/auth for integration tests with newTLSTestServer seems to be Eventually/EventuallyWitT, e.g. lib/auth/tls_test.go and lib/auth/auth_with_roles_test.go.

I've bumped the timeout up to 5s to make it less sensitive to slow CI.

require.NoError(t, err)

require.EventuallyWithT(t, func(t *assert.CollectT) {
alerts, err := srv.Auth().GetClusterAlerts(ctx, types.GetClusterAlertsRequest{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this require.EventuallyWithT usage is to wait for the cache, then you can probably use the backend directly instead with srv.Auth().Services.GetClusterAlerts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's waiting for the watcher to handle the event, which happens asynchronously. I don't think we can get away from the 'waiting' here.

@nixpig nixpig force-pushed the jakew/entra-cert-expiry-alerts branch from 1e5e86a to 344976e Compare March 13, 2026 15:33
@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2026

Amplify deployment status

Branch Commit Job ID Status Preview Updated (UTC)
jakew/entra-cert-expiry-alerts HEAD 1 ✅SUCCEED jakew-entra-cert-expiry-alerts 2026-03-13 15:36:33

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f8eb96b1a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants