Skip to content

Fix FSS audit rules for aarch64 and add root-cause diagnostic tooling#1819

Open
juliuskoskela wants to merge 3 commits intotiiuae:mainfrom
juliuskoskela:fss-fix
Open

Fix FSS audit rules for aarch64 and add root-cause diagnostic tooling#1819
juliuskoskela wants to merge 3 commits intotiiuae:mainfrom
juliuskoskela:fss-fix

Conversation

@juliuskoskela
Copy link
Contributor

Description of Changes

Fix aarch64 audit rule failures caused by references to x86-only syscalls, and add structured FSS diagnostic tooling to replace ad-hoc binary pass/fail verification with a shared failure classifier and repeatable root-cause capture workflows.

Audit rules (aarch64 fix):

  • Introduce per-architecture syscall selection (chown, chmod, delete, access families) so rules are valid on both x86_64 and aarch64
  • Guard open/creat-based OSPP rules behind !isAarch64 conditionals
  • Remove duplicate STIG delete rule already covered by common rules

Shared classifier library:

  • fss-verify-classifier.sh — single source of truth for failure classification policy (active system journals = critical, archived/user = warning, temp = ignore, key errors = critical)
  • fss-runtime-layout.sh — standardized FSS state discovery (machine-id, journal dir, sealing/verification keys)
  • Both libraries are sourced at build time via builtins.readFile by the verify service, fss-test, fss-debug, and fss-rootcause

New diagnostic tools:

  • fss-debug — collects a timestamped evidence packet (identity, layout, journal inventory, mounts, service state, full verify output, classified summary)
  • fss-rootcause — wraps fss-debug into labeled checkpoint sessions with host-side VM lifecycle capture and cross-checkpoint diffing for isolating idle-vs-suspend corruption

Investigation artifacts:

  • gui-vm-rootcause-runbook.md and post-rebuild-runbook.md for the active fss-bad-message-2026-02-25 incident
  • post-rebuild-collect.sh automation script

Tests & docs:

  • NixOS test exercises the classifier with synthetic inputs covering all failure branches
  • fss-debug end-to-end test
  • FSS documentation updated with tool usage and classification semantics

Type of Change

  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

Investigation: fss-bad-message-2026-02-25 — intermittent FSS verification failures on aarch64 targets and gui-vm journal corruption.

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. Build and flash an aarch64 target (Orin AGX/NX) — confirm auditd starts cleanly with no "unknown syscall" errors in journalctl -u auditd
  2. Run fss-test on both x86_64 and aarch64 — verify classified output shows severity tags and no false-positive critical failures from archived/user/temp journals
  3. Run fss-debug — verify evidence packet is created under /var/tmp/fss-debug-*/ with summary/summary.md containing correct classification
  4. Run the fss-rootcause checkpoint workflow:
    fss-rootcause capture-guest --session-dir /var/tmp/test --label baseline
    sleep 60
    fss-rootcause capture-guest --session-dir /var/tmp/test --label after-idle
    fss-rootcause compare --session-dir /var/tmp/test --from baseline --to after-idle
    
    Verify comparison summary is generated at /var/tmp/test/compare/baseline-vs-after-idle/summary.md
  5. Run the NixOS VM test: nix build .#checks.x86_64-linux.fss-verification (or equivalent) — confirm all classifier branches pass

@kajusnau
Copy link
Collaborator

git history could use a cleanup before this PR is actually ready for merge 🙂 🫡

Signed-off-by: Julius Koskela <julius.koskela@unikie.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Forward Secure Sealing (FSS) journal verification to use a shared journalctl --verify output classifier, updates the verification service and test tooling to follow that policy, and refreshes FSS documentation to reflect the updated semantics.

Changes:

  • Add a shared Bash library (fss-verify-classifier.sh) for classifying journalctl --verify failures (active system vs archived/user/temp vs key/filesystem issues).
  • Update the FSS verify service and fss-test script to use the classifier and emit consistent tags/messages.
  • Expand the NixOS VM tests to validate classifier behavior across multiple synthetic failure branches; adjust docs formatting/wording.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/logging/test_scripts/fss_verification.nix Switches verification checks to use the classifier and adds branch coverage for classification policy.
tests/logging/test_scripts/fss-test.nix Embeds the shared classifier library and uses it to drive pass/warn/fail reporting for on-target validation.
tests/logging/default.nix Installs the classifier into the VM test environment under /etc/ for sourcing.
modules/common/logging/fss.nix Refactors the journal-fss-verify service logic to use classifier-based severity/tags and stricter key handling.
modules/common/logging/fss-verify-classifier.sh Introduces the shared classifier/log helpers for journalctl --verify output.
docs/src/content/docs/ghaf/scs/fss.mdx Updates tables/wording to better describe integrity/corruption semantics and warning vs critical cases.

Signed-off-by: Julius Koskela <julius.koskela@unikie.com>
Signed-off-by: Julius Koskela <julius.koskela@unikie.com>
Signed-off-by: Julius Koskela (Digimuoto Oy) <julius.koskela@digimuoto.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants