Skip to content

Commit 3cedef6

Browse files
committed
tpm: stabilize muxed TPM and DevID attestation for system VMs
Consolidate TPM mux rollout and SPIFFE hardening so system VMs use host-managed TPM forwarding with safer provisioning and attestation recovery behavior. This removes transient TPM monitor noise and aligns VM TPM access with the muxed infrastructure now used by admin, audio, gui, and net VMs.
1 parent c9e03c2 commit 3cedef6

File tree

21 files changed

+1854
-189
lines changed

21 files changed

+1854
-189
lines changed

docs/astro.config.mjs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ export default defineConfig({
6262
"ghaf/overview/arch/secureboot",
6363
"ghaf/overview/arch/stack",
6464
"ghaf/overview/arch/guest-tpm",
65+
"ghaf/overview/arch/tpm-mux",
6566
],
6667
},
6768
{

docs/src/content/docs/ghaf/overview/arch/guest-tpm.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,6 @@ App VMs currently use emulated TPM on all platforms, unlike system VMs which use
6969

7070
### Remarks
7171

72+
For the host-side shared hardware TPM mux architecture used by system VMs, see [TPM Mux for System VMs](/ghaf/overview/arch/tpm-mux).
73+
7274
Applications that access the TPM should ideally use TPM 2.0 authenticated sessions. They enable encryption of TPM command payloads, which ensures inspecting or tampering of key material in-transit is not possible. This can be enforced in future updates.
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
---
2+
title: TPM Mux for System VMs
3+
description: Architecture, implementation, and test plan for shared hardware TPM access in Ghaf system VMs
4+
---
5+
6+
## Overview
7+
8+
Ghaf system VMs can share one hardware TPM through a host-side mux layer instead of direct passthrough. This design keeps the trust anchor in hardware while avoiding direct concurrent access from multiple guests.
9+
10+
The current non-riscv64 system VM profile uses this model for:
11+
12+
- `admin-vm`
13+
- `audio-vm`
14+
- `gui-vm`
15+
- `net-vm`
16+
17+
On riscv64, system VMs continue to use emulated TPM.
18+
19+
## Why This Exists
20+
21+
Direct passthrough from multiple guests to one TPM can lead to command contention, unreliable startup sequencing, and lockup-like timeout behavior under load. The TPM mux architecture addresses this by introducing controlled fan-out on the host:
22+
23+
- one backend hardware TPM resource manager device (`/dev/tpmrm0` by default)
24+
- one per-VM forwarder process
25+
- one per-VM proxy endpoint exposed to QEMU
26+
27+
This preserves VM isolation while making startup and runtime behavior more predictable.
28+
29+
## Architecture
30+
31+
The system is composed of four layers:
32+
33+
1. **Host TPM backend**
34+
- `tpm2-abrmd` and host kernel TPM stack
35+
- hardware device path defaults to `/dev/tpmrm0`
36+
2. **Per-VM mux forwarder**
37+
- service `ghaf-vtpm-forwarder-<vm>.service`
38+
- binary `vtpm-abrmd-forwarder`
39+
3. **QEMU guest TPM device wiring**
40+
- `-tpmdev passthrough,id=tpm0,path=/run/ghaf-vtpm/<vm>.tpm,cancel-path=/tmp/cancel`
41+
- `-device tpm-tis` (x86_64) or `tpm-tis-device` (aarch64)
42+
4. **Guest userspace consumers**
43+
- storage encryption helpers
44+
- SPIFFE DevID provisioning and attestation
45+
46+
### Boot Ordering
47+
48+
Host systemd ordering enforces forwarder readiness before VM launch:
49+
50+
- `ghaf-vtpm-forwarder-<vm>.service` starts before `microvm@<vm>.service`
51+
- `microvm@<vm>.service` requires the corresponding forwarder service
52+
- forwarders use `Type=notify` and become ready only after link endpoint setup
53+
54+
## Implementation Details
55+
56+
### Host Module
57+
58+
Host orchestration is defined in `modules/microvm/host/tpm-mux.nix`:
59+
60+
- enables host TPM stack and `tpm2-abrmd`
61+
- loads kernel module `tpm_vtpm_proxy`
62+
- creates runtime directory (`/run/ghaf-vtpm` by default)
63+
- creates one forwarder service per mux-enabled VM
64+
- auto-discovers VM list when `ghaf.virtualization.microvm-host.tpmMux.vms = [ ]`
65+
66+
### Guest Module
67+
68+
Guest TPM mode wiring is in `modules/microvm/common/vm-tpm.nix`:
69+
70+
- exactly one TPM mode can be enabled (`passthrough`, `muxed`, or `emulated`)
71+
- mux mode exports host path as QEMU passthrough backend
72+
- guest `tpm0` permissions are configured for TPM userspace components
73+
74+
### System VM Base Integration
75+
76+
System VM defaults now select mux mode on non-riscv64 when storage encryption is enabled:
77+
78+
- `modules/microvm/sysvms/adminvm-base.nix`
79+
- `modules/microvm/sysvms/audiovm-base.nix`
80+
- `modules/microvm/sysvms/guivm-base.nix`
81+
- `modules/microvm/sysvms/netvm-base.nix`
82+
83+
The laptop profile host mux config currently sets an explicit system VM list:
84+
85+
- `modules/profiles/laptop-x86.nix`
86+
87+
## SPIFFE / DevID Considerations
88+
89+
The TPM mux layer is transparent to SPIFFE from a consumer perspective (`/dev/tpm0` inside guests), but it affects timing and reliability characteristics. For TPM DevID flows, keep these guardrails:
90+
91+
- ensure provisioning and attestation are resilient to transient TPM retries/timeouts
92+
- validate DevID cert public key matches VM TPM key before accepting cached certs
93+
- prefer restart-safe provisioning behavior over one-shot assumptions
94+
95+
## Testing Strategy
96+
97+
Use a staged test plan for each change.
98+
99+
### 1. Static and Policy Checks
100+
101+
- `nix fmt -- --fail-on-change`
102+
- `nix develop --command reuse lint`
103+
104+
### 2. Build and Evaluation
105+
106+
- evaluate affected target(s)
107+
- build image/closure that includes updated VM bases and host mux module
108+
109+
### 3. Boot and Service Readiness
110+
111+
On host:
112+
113+
- verify `tpm2-abrmd.service` is active
114+
- verify `ghaf-vtpm-forwarder-<vm>.service` is active for each system VM
115+
- verify `microvm@<vm>.service` starts after the matching forwarder
116+
117+
In each VM (`admin-vm`, `audio-vm`, `gui-vm`, `net-vm`):
118+
119+
- verify `/dev/tpm0` exists and has expected ownership/mode
120+
- run basic command smoke test, for example `tpm2_getrandom 8`
121+
122+
### 4. Concurrency and Stress
123+
124+
Run concurrent TPM command loops in all system VMs and monitor:
125+
126+
- VM command success rate
127+
- forwarder restarts and error counters
128+
- host kernel TPM timeout messages
129+
130+
Gate condition: sustained test window with no forwarder crashes and acceptable command success ratio.
131+
132+
### 5. SPIFFE End-to-End
133+
134+
For TPM DevID-enabled VMs:
135+
136+
- restart `spire-devid-provision` and `spire-agent`
137+
- verify successful node attestation in agent logs
138+
- verify matching successful request completion in server logs
139+
- verify agent restarts load existing SVID without re-entering failure loops
140+
141+
## Troubleshooting Checklist
142+
143+
If a VM shows TPM failures:
144+
145+
1. Confirm forwarder service for that VM is active on host.
146+
2. Confirm VM started after forwarder (systemd ordering).
147+
3. Check host logs for `tpm tpm2: Operation Timed out` or retry loops.
148+
4. Check forwarder logs for backend receive latency and proxy write/read errors.
149+
5. For SPIFFE failures, verify DevID cert/public-key match before retrying attestation.
150+
151+
## Known Constraints
152+
153+
- Shared hardware TPM is still a serialized resource; under heavy multi-VM load, latency spikes can occur.
154+
- Some TPM commands are more sensitive to contention and timeout behavior.
155+
- Operational reliability depends on both mux correctness and consumer retry/backoff behavior.
156+
157+
## Related Documents
158+
159+
- [Virtualized TPM for guests](/ghaf/overview/arch/guest-tpm)
160+
- [Ghaf Architecture Overview](/ghaf/overview/arch/system-architecture)

modules/common/security/spiffe/agent.nix

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ let
3838
plugins {
3939
NodeAttestor "tpm_devid" {
4040
plugin_data {
41+
tpm_device_path = "/dev/tpm0"
4142
devid_cert_path = "${cfg.tpmDevid.certPath}"
4243
devid_priv_path = "${cfg.tpmDevid.privPath}"
4344
devid_pub_path = "${cfg.tpmDevid.pubPath}"
@@ -85,6 +86,13 @@ let
8586
if [ -f "${cfg.tpmDevid.certPath}" ] && \
8687
[ -f "${cfg.tpmDevid.privPath}" ] && \
8788
[ -f "${cfg.tpmDevid.pubPath}" ]; then
89+
if [ -d "${cfg.dataDir}" ] && grep -Rqs "/spire/agent/join_token/" "${cfg.dataDir}" 2>/dev/null; then
90+
echo "Join-token SVID cache detected, resetting SPIRE agent state for tpm_devid"
91+
for entry in "${cfg.dataDir}"/* "${cfg.dataDir}"/.[!.]* "${cfg.dataDir}"/..?*; do
92+
[ -e "$entry" ] || continue
93+
rm -rf "$entry"
94+
done
95+
fi
8896
echo "DevID files found, using tpm_devid attestation"
8997
ln -sf /etc/spire/agent-tpm-devid.conf /run/spire/agent.conf
9098
else
@@ -283,7 +291,7 @@ in
283291
"/run/spire"
284292
]
285293
++ lib.optionals useTpmDevid [
286-
"/dev/tpmrm0"
294+
"/dev/tpm0"
287295
];
288296
};
289297
};

0 commit comments

Comments
 (0)