Skip to content

tpm: route system VMs through host mux and harden DevID attestation#1790

Draft
vadika wants to merge 12 commits intotiiuae:mainfrom
vadika:tpm-mux-abrmd
Draft

tpm: route system VMs through host mux and harden DevID attestation#1790
vadika wants to merge 12 commits intotiiuae:mainfrom
vadika:tpm-mux-abrmd

Conversation

@vadika
Copy link
Contributor

@vadika vadika commented Mar 1, 2026

Summary

  • add host-side TPM mux infrastructure (vtpm-abrmd-forwarder, tpm-mux.nix, unit ordering/ready signaling) and switch system VMs (admin, audio, gui, net) to muxed TPM on non-riscv64
  • harden SPIFFE TPM DevID provisioning/signing by validating cert-to-key matches, regenerating stale certs, and improving provisioning ordering/recovery paths
  • add agent-side guard to clear stale join-token cache when TPM DevID mode is selected, and document TPM mux architecture/operations in docs

Verification

  • runtime validation on reflashed target: forwarder units active, system VMs booting with muxed /dev/tpm0, TPM command loops, and SPIRE TPM DevID attestation recovery for audio-vm, gui-vm, and net-vm

Notes

  • intermittent TPM backend contention still appears under load (backend busy / slow responses), but the new mux path and attestation recovery logic are now functional and recoverable after reprovision

Manual Runtime Validation

Build, lint, and cross-compilation checks are covered by CI/CD.
Manual validation below focuses only on runtime behavior of TPM mux, SPIFFE TPM DevID, and Jetson fTPM/vTPM integration.

1) Host TPM Mux Readiness

On deployed host, verify TPM backend and per-VM forwarders are active:

systemctl status tpm2-abrmd
systemctl status ghaf-vtpm-forwarder-admin-vm ghaf-vtpm-forwarder-net-vm
# x86 targets also:
systemctl status ghaf-vtpm-forwarder-audio-vm ghaf-vtpm-forwarder-gui-vm
ls -l /run/ghaf-vtpm/

Expected:

  • forwarders active for enabled system VMs
  • /run/ghaf-vtpm/<vm>.tpm endpoints present

2) VM Startup Ordering

systemctl status microvm@admin-vm microvm@net-vm
# x86 targets also:
systemctl status microvm@audio-vm microvm@gui-vm

Expected:

  • VM units start after matching ghaf-vtpm-forwarder-<vm>.service

3) VM TPM Smoke Tests

Run in each enabled system VM:

ls -l /dev/tpm0 /dev/tpmrm0
tpm2_getrandom 8
tpm2_getcap properties-fixed
tpm2_pcrread sha256:0

Expected:

  • /dev/tpm0 and /dev/tpmrm0 available
  • TPM commands succeed reliably after settle

4) Concurrent TPM Stress Check

Run loops concurrently in multiple VMs:

for i in $(seq 1 100); do tpm2_getrandom 8 >/dev/null || echo FAIL-$i; done

Monitor host in parallel:

journalctl -u ghaf-vtpm-forwarder-admin-vm -u ghaf-vtpm-forwarder-net-vm -f
# x86 add audio/gui forwarders
dmesg -w | grep -i tpm

Expected:

  • no persistent forwarder crash/restart loops
  • transient contention may occur, but path recovers

5) SPIFFE TPM DevID End-to-End

In each TPM-attesting VM:

systemctl restart spire-devid-provision
systemctl restart spire-agent
journalctl -u spire-devid-provision -u spire-agent -n 300 --no-pager

On SPIRE server side:

journalctl -u spire-server -n 300 --no-pager

Expected:

  • successful node attestation via tpm_devid
  • restart-safe behavior (no permanent failure loops)

6) Jetson-Specific Runtime Checks

modprobe tpm_ftpm_tee || true
modprobe tpm_vtpm_proxy || true
ls -l /dev/tpm0 /dev/tpmrm0 /dev/vtpmx

systemctl status ghaf-provision-ek-certs ghaf-export-ek-endorsement-bundle
ls -l /persist/common/spire/ca/

Expected:

  • /dev/vtpmx available with vTPM proxy module loaded
  • EK provisioning/export services complete
  • endorsement bundle present for SPIFFE TPM attestation path

Pass Criteria

Manual runtime validation is considered complete when:

  • TPM mux forwarders are healthy and correctly ordered
  • target VMs can execute TPM commands through mux path
  • SPIFFE tpm_devid attestation succeeds for required VMs
  • Jetson fTPM/vTPM runtime path and EK bundle flow are operational

@milva-unikie
Copy link

  • Laptops are now booting
  • Cross-compiled Orins are failing the build
  • On native Orins net-vm is not starting
× microvm@net-vm.service - MicroVM 'net-vm'
     Loaded: loaded (/etc/systemd/system/microvm@.service; static)
    Drop-In: /nix/store/jsisd59znqpw2aw6hnr1qcpf9h2vaf0d-system-units/microvm@net-vm.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Thu 1970-01-01 00:00:53 UTC; 1min 24s ago
   Duration: 689ms
 Invocation: f68c1b7306fd425b9fa9a333203949d1
    Process: 1938 ExecStartPre=/nix/store/w0mb7g3h9hv3wmds57l6hiy0pwvxl9i1-unit-script-microvm_-pre-start/bin/microvm_-pre-start (code=exited, status=0/SUCCESS)
    Process: 1976 ExecStart=/var/lib/microvms/net-vm/current/bin/microvm-run (code=exited, status=1/FAILURE)
    Process: 2073 ExecStopPost=/nix/store/7r9anwqwxq4cbm9v0b3wa45q76136sws-unit-script-microvm_-post-stop/bin/microvm_-post-stop (code=exited, status=0/SUCCESS)
   Main PID: 1976 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
         IO: 49.3M read, 0B written
   Mem peak: 78.3M
        CPU: 372ms

Jan 01 00:00:52 ghaf-host systemd[1]: Starting MicroVM 'net-vm'...
Jan 01 00:00:52 ghaf-host systemd[1]: Started MicroVM 'net-vm'.
Jan 01 00:00:53 ghaf-host microvm@net-vm[1976]: microvm@net-vm: -tpmdev passthrough,id=tpm0,path=/run/ghaf-vtpm/net-vm.tpm,cancel-path=/tmp/cancel: Cannot access TPM device using '/run/ghaf-vtpm/net-vm.tpm': No such file or directory
Jan 01 00:00:53 ghaf-host systemd[1]: microvm@net-vm.service: Main process exited, code=exited, status=1/FAILURE
Jan 01 00:00:53 ghaf-host systemd[1]: microvm@net-vm.service: Failed with result 'exit-code'.
Jan 01 00:00:53 ghaf-host systemd[1]: microvm@net-vm.service: Consumed 372ms CPU time, 78.3M memory peak, 49.3M read from disk. 

everton-dematos and others added 2 commits March 2, 2026 19:31
Enable the protocol for intervm server/agent configuration

Co-authored-by: Ganga Ram <Ganga.Ram@tii.ae>
Co-authored-by: shamma-alblooshi1 <shamma.alblooshi@tii.ae>
Co-authored-by: Brian McGillion <bmg.avoin@gmail.com>

Signed-off-by: Brian McGillion <bmg.avoin@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
retrieve the EK certs at build time and create a list of possible
devices. Then link these into the spiffe workflow, so that we can
validate and verify the TPM for enrollment as an attestor.

Signed-off-by: Brian McGillion <bmg.avoin@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
vadika added 2 commits March 3, 2026 14:34
Roll up TPM mux bring-up, VM ordering fixes, probe/readiness cleanup, and in-process TABRMD forwarder integration into one coherent step. This keeps the TPM path reliable under contention while preserving the intended SPIFFE DevID flow.

Signed-off-by: vadik likholetov <vadikas@gmail.com>
Combine pytss cross-patch integration with jetpack-nixos wip-ftpm input switch and Orin fTPM/vTPM enablement so CI and Jetson targets consume one consistent TPM toolchain/kernel path.

Signed-off-by: vadik likholetov <vadikas@gmail.com>
Provision and export Jetson EK certificates with robust host-side ordering, then consume the endorsement bundle from shared storage for TPM DevID attestation.

Relax SPIRE server mount dependency to /etc/common so admin-vm startup does not fail on storage mount races.

Signed-off-by: vadik likholetov <vadikas@gmail.com>
@vadika vadika added the Needs Testing CI Team to pre-verify label Mar 4, 2026
@milva-unikie
Copy link

Net-vm is still not booting on the Orins. It needs to be fixed before manual testing.

@milva-unikie milva-unikie removed the Needs Testing CI Team to pre-verify label Mar 4, 2026
@vadika
Copy link
Contributor Author

vadika commented Mar 4, 2026

Net-vm is still not booting on the Orins. It needs to be fixed before manual testing.

booting in my tests, how do you do it?

@milva-unikie
Copy link

Net-vm is still not booting on the Orins. It needs to be fixed before manual testing.

booting in my tests, how do you do it?

We turn the Orin on and wait until we are able to connect via ssh. Instead of connecting to net-vm (like it should), the connection opens to ghaf-host. The debug logs from the host show that net-vm did not start. This is only happening with this PR and it happened with all four Orin targets.

○ microvm@net-vm.service - MicroVM 'net-vm'
     Loaded: loaded (/etc/systemd/system/microvm@.service; static)
    Drop-In: /nix/store/6dpk9f3bdp0mpg3ssc22r9lzx4i38362-system-units/microvm@net-vm.service.d
             └─overrides.conf
     Active: inactive (dead)

Jan 01 00:02:12 ghaf-host systemd[1]: Dependency failed for MicroVM 'net-vm'.
Jan 01 00:02:12 ghaf-host systemd[1]: microvm@net-vm.service: Job microvm@net-vm.service/start failed with result 'dependency'.

vadika added 7 commits March 4, 2026 14:21
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: vadik likholetov <vadikas@gmail.com>
@vadika vadika marked this pull request as draft March 9, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants