Skip to content

SAML 2.0 Service Provider Support for Keystone#11230

Open
bbobrov wants to merge 31 commits intomasterfrom
007-wp3-implementation
Open

SAML 2.0 Service Provider Support for Keystone#11230
bbobrov wants to merge 31 commits intomasterfrom
007-wp3-implementation

Conversation

@bbobrov
Copy link
Copy Markdown
Contributor

@bbobrov bbobrov commented Mar 30, 2026

Adds Shibboleth SP 3 (mod_shib) integration to the Keystone Helm chart, enabling SAML 2.0 federated authentication with per-tenant cryptographic isolation.

Shibboleth SP Configuration

  • mod_shib integration with Apache (WSGIScriptAliasMatch, VirtualHost handler, handlerSSL=false for TLS-terminated ingress)
  • Per-tenant ApplicationOverride with dedicated Sessions block, handlerURL, MetadataProvider, CredentialResolver
  • RequestMapper with nested Path elements for multi-tenant routing
  • NameIDAttributeDecoder for unspecified format (SAP IAS sends P-numbers)
  • SAP IAS plain-name attribute mapping (mail, first_name, last_name, user_uuid)
  • Chaining CredentialResolver for SP key rotation support

Runtime Configuration

  • Python script generates shibboleth2.xml and federation-saml.conf at startup from tenant-list ConfigMap
  • Zero per-tenant values in the Keystone chart — all tenant data lives in the federation chart
  • Stakater Reloader triggers rolling restarts on federation ConfigMap/Secret changes

SP Key Management

  • Per-tenant, per-region RSA-4096 key pairs stored in Vault
  • Distroless Go init container (keystone-saml-sp-init) copies SP keys from Vault-injected Secret to tmpfs
  • Secret volume defaultMode 0444 for nonroot container compatibility

Monitoring

  • Prometheus alert for SAML assertion validation failures

bbobrov added 20 commits March 30, 2026 22:47
…chart

New templates:
- _shibboleth2.xml.tpl: Shibboleth SP config with per-tenant MetadataProvider,
  memcached-backed shared session/replay cache for multi-pod support
- _attribute-map.xml.tpl: SAML attribute to environment variable mapping
- _federation-saml.conf.tpl: per-tenant Apache Location blocks with AuthType shibboleth
- federation-saml.yaml: K8s Secret wrapping all SAML config files

Modified templates:
- values.yaml: federation.saml section alongside existing federation.oidc
- _keystone.conf.tpl: add 'mapped' to auth methods, Shib-Identity-Provider
  as remote_id_attribute when SAML enabled
- _wsgi-keystone.conf.tpl: IncludeOptional for SAML config
- deployment-api.yaml: volume mounts for shibboleth, metadata, SP keys;
  hash annotation for rolling updates; stakater reloader integration
- ingress-api.yaml: extend session affinity to cover SAML browser flows
- _keystone_api.sh.tpl: shibd startup when SAML enabled

Existing OIDC federation is fully preserved when SAML is disabled.
Creates a K8s Secret 'keystone-saml-sp' containing the SP private key
and certificate. Values are resolved from Vault references by the
vault-injector during Concourse deployment.

Vault path convention:
  secrets/<region>/keystone/federation/saml-sp/<tenant>/key
  secrets/<region>/keystone/federation/saml-sp/<tenant>/cert
Adds OpenstackKeystoneSAMLAssertionValidationFailed alert that fires
when mod_shib rejects SAML assertions (signature, timing, issuer, or
context binding failures). Follows existing alert patterns for Keystone.
The deployment template references .Values.federation.saml.idp.metadataConfigMap
for mounting tenant IdP metadata from the federation chart's ConfigMap.
The default was missing from values.yaml.
Adds an init container that copies the SP private key from the K8s
Secret mount to a tmpfs (RAM-only) volume. The main Keystone container
reads the key from tmpfs, never from the Secret directly.

Currently a noop (plain copy). When Vault Transit + Kubernetes auth
are available, this init container will be replaced with a Transit
decrypt call, matching the SF.Sem.2 init-container-to-tmpfs pattern.

Also adds the tmpfs emptyDir volume (medium: Memory, 32Ki limit).
…tion

Each tenant now gets its own RSA-4096 SP key pair instead of sharing
one key across all tenants. This provides:
- Cryptographic isolation: compromise of one tenant's SP key does not
  affect other tenants
- Independent key rotation: coordinate with one customer at a time
- Per-tenant trust boundary: each IdP only trusts its tenant's SP cert

Implementation:
- values.yaml: privateKey/certificate moved from sp.* to per-tenant
  entries in idp.tenants[].privateKey/certificate (Vault references)
- secret-saml-sp.yaml: iterates over tenants, stores per-tenant key/cert
- shibboleth2.xml: per-tenant ApplicationOverride blocks, each with its
  own CredentialResolver and MetadataProvider
- federation-saml.conf: ShibRequestSetting applicationId per tenant
  selects the correct ApplicationOverride
- deployment-api.yaml: init container copies all per-tenant key files
  from K8s Secret to tmpfs
The protocol name in the URL path and Keystone API should be 'saml2'
(descriptive of the authentication method), not 'mapped' (which is the
auth plugin name). The Keystone auth method in keystone.conf remains
'mapped' — the protocol name and auth plugin name are independent.
Keystone uses the federation protocol name as the auth method name.
Since our protocol is 'saml2', the auth methods list must include
'saml2' not 'mapped'. Both are entry points for the same Mapped plugin
class, but the name must match what appears in the federation URL path.
Migrates to single-repo tenant management (PLAN_C_WITH_STAKATER).
All per-tenant data (SP keys, IdP metadata, mappings) now lives in
the federation repo. The Keystone chart has zero tenant-specific values.

Key changes:
- New: _generate_saml_config.py.tpl — Python script that reads the
  tenant-list JSON from a ConfigMap and generates shibboleth2.xml +
  federation-saml.conf at pod startup (runtime, not template time)
- _keystone_api.sh.tpl: calls Python generator before starting shibd/Apache
- deployment-api.yaml: mounts tenant-list ConfigMap, adds Stakater
  configmap.reloader annotation for auto-restart on federation changes,
  init container uses glob instead of Helm-templated tenant list
- values.yaml: replaced idp.tenants[] with tenantListConfigMap and
  spKeySecretName references to federation chart resources
- federation-saml.yaml: simplified to only attribute-map.xml (shibboleth2.xml
  and federation-saml.conf are generated at runtime)
- Deleted: secret-saml-sp.yaml (moved to federation chart)

Stakater Reloader watches keystone-saml-tenant-list ConfigMap and
keystone-saml-sp Secret. When the federation chart deploys a new
tenant, Reloader triggers a Keystone rolling restart automatically.
Avoids Docker Hub rate limits. Follows the convention used by other
charts in this repo.
The keystone-bin ConfigMap is already mounted as a directory at /scripts
(line 258). The subPath mount at /scripts/generate-saml-config.py
conflicts with the directory mount, causing Python to fail with
'can't find __main__ module'. The script is already available at
/scripts/generate-saml-config.py from the directory mount.
- Remove RequestMapper from shibboleth2.xml (caused duplicate Path
  warnings). Auth enforcement is handled by Apache Location blocks in
  federation-saml.conf, not by the Shibboleth RequestMapper.
- Rename REMOTE_USER attribute IDs to email-nameid, persistent-id,
  transient-id (REMOTE_USER is a reserved name in Shibboleth).
  The REMOTE_USER env var is populated from these IDs via the
  REMOTE_USER attribute in ApplicationDefaults.
- Remove deprecated MetadataGenerator handler.
The Location directives in federation-saml.conf (including the
Shibboleth.sso handler and per-tenant auth endpoints) must be inside
the VirtualHost *:5000 block to apply to Keystone's port. When outside
the VirtualHost, Apache registers them on the default server which
doesn't handle requests, causing 404 on /Shibboleth.sso/SAML2/POST.
… config

The <Location /Shibboleth.sso> SetHandler and LoadModule must be
directly in the VirtualHost block in wsgi-keystone.conf, not in the
runtime-generated federation-saml.conf included via IncludeOptional.

This follows the upstream Keystone pattern where the Shibboleth handler
is placed directly in the Apache VirtualHost config. The per-tenant
<Location> blocks (with AuthType shibboleth) remain in the
IncludeOptional file generated at runtime.
…uting

WSGIScriptAlias / takes precedence over <Location> SetHandler shib,
causing 404 on /Shibboleth.sso/SAML2/POST (the ACS endpoint). Replace
with WSGIScriptAliasMatch using a negative lookahead regex that routes
all paths to Keystone WSGI except /Shibboleth.sso which is handled by
mod_shib.

When SAML is disabled, the original WSGIScriptAlias / is used —
zero behavior change for non-SAML and OIDC-only deployments.
…coexistence

Replace the negative lookahead regex approach (which didn't work with
mod_wsgi's regex engine) with the exact upstream Keystone devstack
pattern:

1. WSGIScriptAliasMatch for federation auth and websso paths (takes
   priority over WSGIScriptAlias for these specific URLs)
2. WSGIScriptAlias / as the catch-all (unchanged)
3. <Location /Shibboleth.sso> SetHandler shib for the ACS endpoint

The WSGIScriptAliasMatch directives are placed BEFORE WSGIScriptAlias
so they take priority for federation paths. The catch-all WSGIScriptAlias
handles all other Keystone API requests as before.

References: keystone/devstack/files/federation/shib_apache_alias.txt
Apache 2.4 defaults to UseCanonicalName Off, which passes the
client-supplied Host header to mod_shib. When requests arrive with
Host: localhost (e.g., from health checks or internal curl), mod_shib
cannot match the hostname to its handler configuration and returns 404.

With UseCanonicalName On, Apache always uses the configured ServerName,
so mod_shib sees the correct hostname regardless of the client's Host
header. This is required for the /Shibboleth.sso handler to work
behind a reverse proxy.
mod_shib's handlerSSL=true rejects all requests received over plain
HTTP with a 404. In Kubernetes, TLS is terminated at the ingress
controller — Apache receives plain HTTP. Setting handlerSSL=false
tells mod_shib to accept HTTP requests while cookieProps=https
ensures session cookies are still marked Secure (set by the ingress
via the X-Forwarded-Proto header).
… changes

The root cause of the /Shibboleth.sso 404 was handlerSSL=true in the
generated shibboleth2.xml (fixed in _generate_saml_config.py.tpl).
The wsgi-keystone.conf only needs two additions for SAML:

1. <Location /Shibboleth.sso> SetHandler shib - the ACS handler
2. IncludeOptional federation-saml.conf - per-tenant auth endpoints

Both inside the VirtualHost block.

Removed experimental changes that did not contribute to the fix:
- UseCanonicalName On (not needed)
- WSGIScriptAliasMatch for federation paths (not needed)
- LoadModule mod_shib inside VirtualHost (dead code, loaded from mods-enabled)

Also removed unused Helm template files replaced by runtime generation:
- _shibboleth2.xml.tpl (replaced by _generate_saml_config.py.tpl)
- _federation-saml.conf.tpl (replaced by _generate_saml_config.py.tpl)
bbobrov added 11 commits March 30, 2026 22:49
The ACS endpoint (/Shibboleth.sso/SAML2/POST) uses the default
application context when the relay state does not carry the applicationId
back from the IdP. Without MetadataProviders in ApplicationDefaults,
mod_shib cannot validate the assertion and throws
'No MetadataProvider available'.

Adding all tenants' MetadataProviders and the first tenant's
CredentialResolver to ApplicationDefaults as a fallback. The per-tenant
ApplicationOverride blocks still override these for normal operation.
Each tenant's ApplicationOverride now has its own Sessions block with
handlerURL='/Shibboleth.sso/<tenant>'. This gives each tenant a unique
ACS endpoint (e.g., /Shibboleth.sso/acme/SAML2/POST). When the IdP
posts the assertion back, Shibboleth resolves the correct applicationId
from the URL, using the correct per-tenant MetadataProvider and
CredentialResolver.

This is the Shibboleth-recommended pattern for path-based overrides.
Previously, all tenants shared /Shibboleth.sso/SAML2/POST which always
resolved to the default application (no MetadataProvider).

Also:
- Reverted fallback MetadataProviders/CredentialResolver from
  ApplicationDefaults (no longer needed)
- Changed <Location /Shibboleth.sso> to <LocationMatch> regex to match
  all per-tenant handler URLs (/Shibboleth.sso/acme/*, etc.)

Note: The customer's IdP must be configured with the tenant-specific
ACS URL from the SP metadata.
Each tenant's ApplicationOverride has handlerURL='/Shibboleth.sso/<tenant>'
but Shibboleth needs to know which applicationId to use when a request
arrives at that URL. Without a RequestMapper, requests to
/Shibboleth.sso/<tenant>/* resolve to the default application which
has no MetadataProvider — causing 'unconfigured location' errors.

The RequestMapper maps /Shibboleth.sso/<tenant> to applicationId=<tenant>
for each enabled tenant. This tells Shibboleth to use the correct
ApplicationOverride (with the correct MetadataProvider, CredentialResolver,
and handlerURL) for ACS callbacks and metadata endpoints.
…block

Shibboleth SP 3.4 does not support handlerURL as an attribute of
ApplicationOverride — it must be in a nested <Sessions> element.
When an ApplicationOverride has its own <Sessions>, it replaces the
parent's Sessions entirely, so all handlers (MetadataGenerator, Status,
Session) must be explicitly included or they won't work under the
per-tenant handler path.

Verified locally: /Shibboleth.sso/acme/Status returns 200,
/Shibboleth.sso/acme/Metadata returns SP metadata XML with the
per-tenant ACS URL, and /Shibboleth.sso/acme/SAML2/POST correctly
processes the ACS endpoint (returns BindingException when no assertion
is in the POST body, which is the expected behavior).
REMOTE_USER was empty because Shibboleth SP 3 requires NameIDAttributeDecoder
in attribute-map.xml to extract the NameID from the SAML assertion into a
regular attribute. Without it, NameID format URIs are treated as regular
attribute names which don't match the <NameID> XML element.

Changes:
- attribute-map.xml: Use NameIDAttributeDecoder with formatter=$Name for
  emailAddress, persistent, and transient NameID formats. This extracts the
  NameID value (e.g., the email) into email-nameid, persistent-id, or
  transient-id attributes.
- attribute-map.xml: Add ADFS/SAP IAS claim URI mappings for email, UPN,
  name, givenName, surname.
- _generate_saml_config.py.tpl: Remove invalid <NameIDAttribute> element
  (SP 2 feature, not valid in SP 3).

Verified locally: shibd -t reports configuration is loadable.
…ributes

REMOTE_USER was empty because:
1. SAP IAS sends NameID with format 'unspecified' (not emailAddress/persistent/
   transient) — we had no decoder for it
2. SAP IAS sends attributes with plain names ('mail', 'first_name', etc.) —
   not OIDs or ADFS claim URIs — we had no mappings for them

Changes:
- attribute-map.xml: Add NameIDAttributeDecoder for 'unspecified' format,
  mapped to 'unspecified-nameid'. Add SAP IAS plain-name attribute
  mappings (mail, first_name, last_name, user_uuid).
- shibboleth2.xml REMOTE_USER: 'unspecified-nameid' is first priority,
  followed by email-nameid, persistent-id, etc., with 'mail' as last resort.

The REMOTE_USER priority depends on the IdP. For SAP IAS with NameID format
'unspecified', REMOTE_USER gets the NameID value (e.g., P000015). For ADFS
with emailAddress format, it gets the email. Operators should adjust the
priority based on their IdP's behavior.

Verified locally: shibd -t reports configuration is loadable.
When a tenant has rotation: true in the tenant-list JSON (from the
federation chart's region flag file), the runtime config generator
produces a Chaining CredentialResolver with both current and next
key/cert pairs. Shibboleth tries each resolver in order for
decryption and signing.

During normal operation (rotation absent or false), a single File
CredentialResolver is used as before.

Verified locally: shibd -t accepts Chaining CredentialResolver in
ApplicationOverride.
Shibboleth parses Path name segments hierarchically. Flat paths like
<Path name="Shibboleth.sso/acme"/> and <Path name="Shibboleth.sso/ajax"/>
cause only the first tenant to be matched — the second gets 'handler
invoked at an unconfigured location'.

Using nested paths:
  <Path name="Shibboleth.sso">
      <Path name="acme" applicationId="acme"/>
      <Path name="ajax" applicationId="ajax"/>
  </Path>

correctly matches all tenants as siblings under the Shibboleth.sso
handler path.

Verified locally: reproduced the bug (acme 200, ajax 500 with flat
paths), confirmed the fix (both 200 with nested paths).
…ml-sp-init)

The SAML SP key init container now uses the keystone-saml-sp-init image:
a statically compiled Go binary in a distroless container with no shell.

Image reference follows the standard pattern:
  registryAlternateRegion/federation.saml.spInit.image:federation.saml.spInit.imageTag

This closes BSI deviation #2 (init container must be Go binary in
distroless image per 4.2 Sicherheitsarchitektur L847-853).
…ntainer

The distroless init container runs as nonroot (UID 65534). With
defaultMode 0440, files are owned by root:root and not readable by
the nonroot user. Changing to 0444 makes the Secret mount readable.

The init container still writes 0440 permissions on the tmpfs
destination (enforced by os.Chmod in the Go binary), so the final
files are restricted to the nonroot user only.
@bbobrov bbobrov force-pushed the 007-wp3-implementation branch from ec815e2 to 8463aa1 Compare March 30, 2026 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants