Skip to content

Comments

Allow configuring TPM init container security context#676

Open
jmbass wants to merge 1 commit intospiffe:mainfrom
jmbass:fix/tpm-nonroot
Open

Allow configuring TPM init container security context#676
jmbass wants to merge 1 commit intospiffe:mainfrom
jmbass:fix/tpm-nonroot

Conversation

@jmbass
Copy link

@jmbass jmbass commented Oct 8, 2025

Summary

  • keep the TPM plugin and SPIRE server non-root while letting the init copy run as root by default (and remain tunable)
  • add values to customize the init container security context, mount path, state directory, and chown behavior to fit strict clusters
  • update helpers, configmap references, documentation, and chart metadata to surface the new controls

Testing

  • helm lint charts/spire

- let the TPM init container run as root so the plugin copy succeeds on root-owned PVCs
- add values to control the init container security context, chown behaviour, mount and state paths
- update chart docs/version and ensure configmap uses the new helper values

- helm lint charts/spire

Signed-off-by: jo.perez <jo.perez@criteo.com>
@kfox1111
Copy link
Collaborator

kfox1111 commented Oct 8, 2025

Hello. Thanks for contributing to the chart.

I'm a little confused as to what is being attempted to be accomplished.

I believe the tpm plugin has to be run as root to function properly. But the current design of spire requires spire itself to exec the tpm plugin, so it has to run as root too for that to happen.

Am I missing something?

@jmbass
Copy link
Author

jmbass commented Oct 8, 2025

@kfox1111 Hi! Appreciate the quick review, and happy to clarify.

The SPIRE agent needs root access when it talks to /dev/tpm*, but this patch is in the server chart. The tpm_direct node-attestor plugin that the server execs never touches the TPM device; it only validates the payloads coming from the agents and keeps some state in /run/spire/data. We’ve been running spire-server (and the plugin it forks) as an arbitrary non-root UID successfully as long as the staged files are owned by that UID.

The issue was the init container that copies the plugin onto the PVC. It inherited the server’s non-root UID, but the PVC surfaces as root-owned on hardened clusters, so the copy (and the directory creation) failed. The new initSecurityContext lets us run the init container as root by default just long enough to copy the binary and chown the state directory back to the pod UID/GID. If a downstream already has a writable location they can override that value and keep the init container non-root as well. The main server container—and the TPM plugin it execs—remain non-root throughout.

@kfox1111
Copy link
Collaborator

kfox1111 commented Oct 8, 2025

Oh, right. ok....

Could it be that we just need to move the existing initContainer, name: chown
up in the list so it happens before the tpm plugin so the state dir is the right user before the mkdirs?

@jmbass
Copy link
Author

jmbass commented Oct 8, 2025

Thanks for your response and insights.

You’re right to look at the existing chown init container but I think that unfortunately it doesn’t help in the cases that fail:

It’s only rendered when $needsChown is true. $needsChown is set when the pod-level runAsUser isn’t defined. In our hardened clusters we explicitly set podSecurityContext.runAsUser: 1000, so $needsChown is false and that init container is not emitted at all :( . I think moving it earlier wouldn't change that.

Even when it is present, it currently runs chown -R … /var/lib/spire here. I understand that this path is used for the legacy datastore.. it never touches /run/spire/data/tpm-direct, so the plugin directories would still be owned by root.

Most importantly, the copy step fails before any later init containers run. The TPM init container inherits the pod’s non-root UID, so cp {{ .Values.nodeAttestor.tpmDirect.pluginPath }} {{ $tpmPluginDst }} here hits a permission error on the root-owned PVC before a separate chown container could fix anything.
That’s why the patch gives the TPM init container its own initSecurityContext (defaults to UID/GID 0) and a chownAfterCopy toggle. It can write the files, chown them to the pod’s UID, and then the main spire-server container keeps running non-root exactly as before.

@kfox1111
Copy link
Collaborator

kfox1111 commented Oct 9, 2025

we could fix up he chown to make sure all the important bits are the right permissions.

I'm a little confused why the pvc would be root owned though, unless it was created initially as root, then the chart was upgraded to use a different user? That could be a problem in other configs too?

This patch also makes several things configurable that are internal implementation details. Is there actually a need to configure those things?

@faisal-memon
Copy link
Collaborator

@jmbass Do you have time to follow up on the comments on this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants