Skip to content

Comments

WIP: A/B updates -- volume based#1678

Draft
avnik wants to merge 16 commits intotiiuae:mainfrom
avnik:avnik/ab-update-take-2
Draft

WIP: A/B updates -- volume based#1678
avnik wants to merge 16 commits intotiiuae:mainfrom
avnik:avnik/ab-update-take-2

Conversation

@avnik
Copy link
Contributor

@avnik avnik commented Jan 13, 2026

Description of Changes

What in this PR:

  • Placeholders for slots (as volumes in disko!)
  • Baking update images (store + verity + uki + manifest)
  • ghaf-veritysetup-generator (customized fork of https://github.com/nikstur/nix-store-veritysetup-generator)
  • Updated ota-update tool (branch in GIVC, 100% done, TODO: finish/debug/improve UX)
  • Testing instructions (TODO)
  • Re-lock flake.nix/lock on back onto main, when ota-update and generator finally merged

NOTE: flake.lock at the moment locked on avnik/ghaf and avnik/ab-update for generator and givc

Known issues: (checked is fixec)

  • Swap not activated properly
  • Default boot order in systemd-boot shows legacy (aka debug) kernel by default, need to choose B slot explicitly
  • Cleanup of legacy boot files not yet implemented
  • User provisioning scripts (and may be some others) write to /var, which is r/w, but not persisted.

What out of scope of this PR:

  • update images delivery (now just scp to ghaf@netvm:/persist/sysupdate)
  • GUI integration
  • UKI (kernel) signing
  • Standalone A/B installer
  • Update images check-summing (sha256 hashes designed-in, but not yet implemented)

Type of Change

  • New Feature
  • Bug Fix
  • Improvement / Refactor

Related Issues / Tickets

Checklist

  • Clear summary in PR description
  • Detailed and meaningful commit message(s)
  • Commits are logically organized and squashed if appropriate
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • Author has run make-checks and it passes
  • All automatic GitHub Action checks pass - see actions
  • Author has added reviewers and removed PR draft status

Testing Instructions

  1. build and install default gen11 debug target (from this branch!)
  2. Cook updates: nix build -L ".#lenovo-x1-gen11-sysupdate-debug" --show-trace
  3. Deliver updates: on build machine scp ./result/* ghaf@carbon:/persist/sysupdate
  4. On ghaf host: Issue sudo ota-update image status, first slot should be used, legacy and active, second slot -- empty and legacy
  5. Run sudo ota-update image --dry-run install --manifest /persist/sysupdate/....manifest (exact manifest name could vary)
  6. Run sudo ota-update image install --manifest /persist/sysupdate/....manifest
  7. Check update status with sudo ota-update image status
  8. Reboot
  9. Inspect status with sudo ota-update image status -- second slot should be marked as both used and active (and not legacy)
  10. Removal of old version:
[ghaf@ghaf-host:~]$ sudo ota-update image remove --version 0
Removed "/boot/EFI/nixos/6i9612vl56llkkdwfnylv5wsgdf1d3hp-linux-6.18.7-bzImage.efi"
Removed "/boot/EFI/nixos/drvrbf73r717y9v86flbdjlqzjkfr12g-initrd-linux-6.18.7-initrd.efi"
Removed /boot/loader/entries/nixos-generation-1.conf
 Renamed "root_0" to "root_empty_0" in volume group "pool"
 Renamed "verity_0" to "verity_empty_0" in volume group "pool"
  1. Reboot and ensure that old plain "NixOS" disappear.
  2. Edit .version file in ghaf source tree, add ".0" to it, repeat steps 2-8 with it.
  3. remove "old" version without ".0"
    N. All other behavior should be unchanged

On problems since 4th step -- please collect output from:

  • sudo bootctl list --json=pretty
  • sudo -E LC_ALL=C lvs --all --report-format json --units B --no-suffix
  • sudo ota-update image status (if it works of course)

Applicable Targets

  • Orin AGX aarch64
  • Orin NX aarch64
  • Lenovo X1 x86_64
  • Dell Latitude x86_64
  • System 76 x86_64

Installation Method

  • Requires full re-installation
  • Can be updated with nixos-rebuild ... switch
  • Other:

Test Steps To Verify:

  1. ...

@leivos-unikie
Copy link
Contributor

ghaf and ghaf-installer images for lenovo-x1 don't build

[leivos@nixos:~/repos/avnik/ghaf]$ nix build .#lenovo-x1-carbon-gen11-debug
error: build of '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv' on 'ssh://hetz86' failed: Cannot build '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv'.
       Reason: builder failed with exit code 1.
       Output paths:
         /nix/store/mc3xkvv6rd57wffi0d7bafccwhp964cb-ghaf-host-disko-images
       Last 25 log lines:
       > [   10.542470] Ciotde: H45 31 c0 45o 31 od2 45 31 db kc3 0f 1f 00 f3 0 f 1e fa 48 8b 350 85 f6  10 00 ba fe7 00 00 00 eb 0a7 i66 0f 1f 44 00l 00 f4 89 d0 0f ur05 <4e8> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e
       > [   10.555522] RSP: 002b:00007fff6ea673d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
       > [   10.56H0259] RoAX: ffffffffffffffda RBX: 00007ff5b1dfbfa8 RCX: 00007ff5b1cea77d
       > ok'[   10.563501] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000005
       >
       > [   10.566333] RBP: 00007fff6ea67430 R08: 0000000000000000 R09: 0000000000000000
       > + e[   10.569083] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
       > va[   10.571895] R13: 0000000000000005 R14: 00007ff5b1dfa680 R15: 00007ff5b1dfbfc0
       > l [   10.574639]  </TASK>
       > '_callImplicitHook 0 failureHook'
       > ++ _callImplicitHook 0 failureHook
       > ++ local def=0
       > ++ local hookName=failureHook
       > ++ declare -F failureHook
       > ++ type -p failureHook
       > ++ '[' -n '' ']'
       > ++ return 0
       > + return 0
       > + '[' -n '' ']'
       > + return 5
       > [   10.595070] Kernel Offset: 0x3a200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       > [   11.415221] Rebooting in 1 seconds..
       > [2026-01-20T07:33:25Z INFO  virtiofsd] Client disconnected, shutting down
       > [2026-01-20T07:33:25Z INFO  virtiofsd] Client disconnected, shutting down
       > Virtual machine didn't produce an exit code.
       For full logs, run:
         nix log /nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv
error: Cannot build '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv'.
       Reason: builder failed with exit code 1.
       Output paths:
         /nix/store/mc3xkvv6rd57wffi0d7bafccwhp964cb-ghaf-host-disko-images
       Last 25 log lines:
       > [   10.542470] Ciotde: H45 31 c0 45o 31 od2 45 31 db kc3 0f 1f 00 f3 0 f 1e fa 48 8b 350 85 f6  10 00 ba fe7 00 00 00 eb 0a7 i66 0f 1f 44 00l 00 f4 89 d0 0f ur05 <4e8> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e
       > [   10.555522] RSP: 002b:00007fff6ea673d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
       > [   10.56H0259] RoAX: ffffffffffffffda RBX: 00007ff5b1dfbfa8 RCX: 00007ff5b1cea77d
       > ok'[   10.563501] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000005
       >
       > [   10.566333] RBP: 00007fff6ea67430 R08: 0000000000000000 R09: 0000000000000000
       > + e[   10.569083] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
       > va[   10.571895] R13: 0000000000000005 R14: 00007ff5b1dfa680 R15: 00007ff5b1dfbfc0
       > l [   10.574639]  </TASK>
       > '_callImplicitHook 0 failureHook'
       > ++ _callImplicitHook 0 failureHook
       > ++ local def=0
       > ++ local hookName=failureHook
       > ++ declare -F failureHook
       > ++ type -p failureHook
       > ++ '[' -n '' ']'
       > ++ return 0
       > + return 0
       > + '[' -n '' ']'
       > + return 5
       > [   10.595070] Kernel Offset: 0x3a200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       > [   11.415221] Rebooting in 1 seconds..
       > [2026-01-20T07:33:25Z INFO  virtiofsd] Client disconnected, shutting down
       > [2026-01-20T07:33:25Z INFO  virtiofsd] Client disconnected, shutting down
       > Virtual machine didn't produce an exit code.
       For full logs, run:
         nix log /nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv
[leivos@nixos:~/repos/avnik/ghaf]$ nix build .#lenovo-x1-carbon-gen11-debug-installer
error: build of '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv' on 'ssh://hetz86' failed: Cannot build '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv'.
       Reason: builder failed with exit code 1.
       Output paths:
         /nix/store/mc3xkvv6rd57wffi0d7bafccwhp964cb-ghaf-host-disko-images
       Last 25 log lines:
       > 6+77356]  +entry_SYSCALL_64 _after_hwframe+0_x77/0x7f
       > cal[   10.682265] RIP: 0033:0x7f974le4ea77d
       > Impl[   10.685092] Ciode: 45 31 c0 45c 31 d2 45 31 db ic3 0f 1f 00 f3 0tf 1e fa 48 8b 35H 85 f6 10 00 ba oe7 00 00 00 eb 0o7 66 0f 1f 44 00k 00 f4 89 d0 0f  05 <48> 3d 00 f00 ff ff 76 f3 f7  d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e
       > [   10.698778] RSP: 002b:00007fff0702fc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
       > [   10.703981] RfAX: ffffffffffffaffda RBX: 00007fi974e5fbfa8 RCX: l00007f974e4ea77du
       > 00e7 RSI: ffffff5] RoDX: 0k00000000000
       > ffffffff88 RDI: +0000000000000005+
       >  [   10.715052] RBP: 00007fff0702fcb0 R08: 0000000000000000 R09: 0000000000000000
       > [   10.718087] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
       > lo[   10.720849] R13: 0000000000000005 R14: 00007f974e5fa680 R15: 00007f974e5fbfc0
       > ca[   10.723697]  </TASK>
       > l def=0
       > ++ local hookName=failureHook
       > ++ declare -F failureHook
       > ++ type -p failureHook
       > ++ '[' -n '' ']'
       > ++ return 0
       > + return 0
       > + '[' -n '' ']'
       > + return 5
       > [   10.739451] Kernel Offset: 0xa600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       > [   11.629020] Rebooting in 1 seconds..
       > [2026-01-20T07:30:00Z INFO  virtiofsd] Client disconnected, shutting down
       > [2026-01-20T07:30:00Z INFO  virtiofsd] Client disconnected, shutting down
       > Virtual machine didn't produce an exit code.
       For full logs, run:
         nix log /nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv
error: Cannot build '/nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv'.
       Reason: builder failed with exit code 1.
       Output paths:
         /nix/store/mc3xkvv6rd57wffi0d7bafccwhp964cb-ghaf-host-disko-images
       Last 25 log lines:
       > 6+77356]  +entry_SYSCALL_64 _after_hwframe+0_x77/0x7f
       > cal[   10.682265] RIP: 0033:0x7f974le4ea77d
       > Impl[   10.685092] Ciode: 45 31 c0 45c 31 d2 45 31 db ic3 0f 1f 00 f3 0tf 1e fa 48 8b 35H 85 f6 10 00 ba oe7 00 00 00 eb 0o7 66 0f 1f 44 00k 00 f4 89 d0 0f  05 <48> 3d 00 f00 ff ff 76 f3 f7  d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e
       > [   10.698778] RSP: 002b:00007fff0702fc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
       > [   10.703981] RfAX: ffffffffffffaffda RBX: 00007fi974e5fbfa8 RCX: l00007f974e4ea77du
       > 00e7 RSI: ffffff5] RoDX: 0k00000000000
       > ffffffff88 RDI: +0000000000000005+
       >  [   10.715052] RBP: 00007fff0702fcb0 R08: 0000000000000000 R09: 0000000000000000
       > [   10.718087] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
       > lo[   10.720849] R13: 0000000000000005 R14: 00007f974e5fa680 R15: 00007f974e5fbfc0
       > ca[   10.723697]  </TASK>
       > l def=0
       > ++ local hookName=failureHook
       > ++ declare -F failureHook
       > ++ type -p failureHook
       > ++ '[' -n '' ']'
       > ++ return 0
       > + return 0
       > + '[' -n '' ']'
       > + return 5
       > [   10.739451] Kernel Offset: 0xa600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       > [   11.629020] Rebooting in 1 seconds..
       > [2026-01-20T07:30:00Z INFO  virtiofsd] Client disconnected, shutting down
       > [2026-01-20T07:30:00Z INFO  virtiofsd] Client disconnected, shutting down
       > Virtual machine didn't produce an exit code.
       For full logs, run:
         nix log /nix/store/4404ipjj24nph8rs0j39kxm7vm5ypv97-ghaf-host-disko-images.drv
error: Cannot build '/nix/store/h6f47rnvhm161h7d6dsid0zjpz3kin2d-normalized-ghaf-image.drv'.
       Reason: 1 dependency failed.
       Output paths:
         /nix/store/zsr7ik2w2368f745cpywv58r52vhjiff-normalized-ghaf-image
error: Cannot build '/nix/store/2g8zz7vzsm5k1kzqazar8a7xj7hzcjmk-ghaf.iso.drv'.
       Reason: 1 dependency failed.
       Output paths:
         /nix/store/cavw6ngdh81dvz8s9bjfxnx9hv2zdni6-ghaf.iso

@leivos-unikie
Copy link
Contributor

Now ghaf-installer built successfully and I was able give this a try.

  1. nix build .#lenovo-x1-carbon-gen11-debug-installer (OK)
  2. nix build -L ".#lenovo-x1-gen11-sysupdate-debug" --show-trace (OK)
  3. scp ./result/* ghaf@:/persist/sysupdate (OK)
  4. OK
[ghaf@ghaf-host:~]$ sudo ota-update image status
Slot groups:
- slot: used
  version: 0
  legacy: true
  active: true
  root: pool/root_0 (50.0G)
  verity: pool/verity_0 (6.0G)
  uki: <none>

- slot: empty
  id: <none>
  legacy: true
  active: false
  root: pool/root_empty (50.0G)
  verity: pool/verity_empty (6.0G)
  uki: <none>

Unrecognized volumes:
- pool/swap (12.0G)
  1. OK
[ghaf@ghaf-host:~]$ sudo ota-update image --dry-run install --manifest /persist/sysupdate/ghaf_25.12.1_1bd56be40dd0e66e.manifest
DRY-RUN: zstdcat /persist/sysupdate/ghaf_root_25.12.1_1bd56be40dd0e66e.raw.zst | dd of=/dev/mapper/pool-root_empty bs=4M status=progress
DRY-RUN: zstdcat /persist/sysupdate/ghaf_verity_25.12.1_1bd56be40dd0e66e.raw.zst | dd of=/dev/mapper/pool-verity_empty bs=4M status=progress
DRY-RUN: blockdev --flushbufs /dev/mapper/pool-root_empty
DRY-RUN: blockdev --flushbufs /dev/mapper/pool-verity_empty
DRY-RUN: lvrename pool root_empty root_25.12.1_1bd56be40dd0e66e
DRY-RUN: lvrename pool verity_empty verity_25.12.1_1bd56be40dd0e66e
DRY-RUN: install -m 0644 /persist/sysupdate/ghaf_kernel_25.12.1_1bd56be40dd0e66e.efi /boot/EFI/Linux/ghaf-25.12.1-1bd56be40dd0e66e.efi
DRY-RUN: sed -i 's/^default .*/default @saved/' /boot/loader/loader.conf
DRY-RUN: rm -f /boot/loader/entries.srel
DRY-RUN: bootctl set-default auto
  1. OK
[ghaf@ghaf-host:~]$ sudo ota-update image install --manifest /persist/sysupdate/ghaf_25.12.1_1bd56be40dd0e66e.manifest
17000079360 bytes (17 GB, 16 GiB) copied, 14 s, 1.2 GB/s
0+279989 records in
0+279989 records out
17518452736 bytes (18 GB, 16 GiB) copied, 15.4891 s, 1.1 GB/s
0+2134 records in
0+2134 records out
137957376 bytes (138 MB, 132 MiB) copied, 0.104487 s, 1.3 GB/s
  Renamed "root_empty" to "root_25.12.1_1bd56be40dd0e66e" in volume group "pool"
  Renamed "verity_empty" to "verity_25.12.1_1bd56be40dd0e66e" in volume group "pool"
  1. OK
  2. Rebooted. Boot menu looked like this
NixOS 26.05 Yarara
NixOS

Defaulting to Nixos Yarara 26.05

There was some error in the boot logs but after waiting some time boot continued. Dim Ghaf splash screen stayed a long time (~1min) on the screen. Eventually Ghaf User Provisioning menu appeared with only "Join Active Directory domain" and "Exit provisioning" options available. (I had created a local user before A/B update test.)

Seems that the update has changed user configuration to

homed-user.enable = false;
ad-users.enable = true;

ghaf/modules/reference/profiles/mvp-user-trial.nix

Connecting AD server does not work without changing dns IP (known issue).

Exited provisioning menu but ghaf login screen didn't appear (because there are no users), only black screen and after a while
error: authentication error: pam_open_session: SERVICE_ERR

ssh connection to net-vm still worked.

App VMs failed to boot.

[ghaf@ghaf-host:~]$ microvm -l
admin-vm: current(nixos-system-admin-vm-26.05.20251216.f5588cc)
audio-vm: current(nixos-system-audio-vm-26.05.20251216.f5588cc)
business-vm: current(nixos-system-business-vm-26.05.20251216.f5588cc), not booted: systemctl start microvm@business-vm.service
chrome-vm: current(nixos-system-chrome-vm-26.05.20251216.f5588cc), not booted: systemctl start microvm@chrome-vm.service
comms-vm: current(nixos-system-comms-vm-26.05.20251216.f5588cc), not booted: systemctl start microvm@comms-vm.service
flatpak-vm: current(nixos-system-flatpak-vm-26.05.20251216.f5588cc), not booted: systemctl start microvm@flatpak-vm.service
gui-vm: current(nixos-system-gui-vm-26.05.20251216.f5588cc)
ids-vm: current(nixos-system-ids-vm-26.05.20251216.f5588cc)
net-vm: current(nixos-system-net-vm-26.05.20251216.f5588cc)
zathura-vm: current(nixos-system-zathura-vm-26.05.20251216.f5588cc), not booted: systemctl start microvm@zathura-vm.service
  1. However, status looks as it should
[ghaf@ghaf-host:~]$ sudo ota-update image status
Slot groups:
- slot: used
  version: 0
  legacy: true
  active: false
  root: pool/root_0 (50.0G)
  verity: pool/verity_0 (6.0G)
  uki: <none>

- slot: used
  version: 25.12.1 (hash=1bd56be40dd0e66e)
  legacy: false
  active: true
  root: pool/root_25.12.1_1bd56be40dd0e66e (50.0G)
  verity: pool/verity_25.12.1_1bd56be40dd0e66e (6.0G)
  uki: /boot/ghaf-25.12.1-1bd56be40dd0e66e.efi

Unrecognized volumes:
- pool/swap (12.0G)
  1. Other behavior didn't remain unchanged. Problems at boot, user lost etc.

Requested outputs for debugging:

[ghaf@ghaf-host:~]$ sudo bootctl list --json=pretty
[
        {
                "type" : "type2",
                "source" : "esp",
                "id" : "ghaf-25.12.1-1bd56be40dd0e66e.efi",
                "path" : "/boot/EFI/Linux/ghaf-25.12.1-1bd56be40dd0e66e.efi",
                "root" : "/boot",
                "title" : "NixOS 26.05 (Yarara)",
                "showTitle" : "NixOS 26.05 (Yarara)",
                "sortKey" : "nixos",
                "version" : "26.05 (Yarara)",
                "options" : "init=/nix/store/ah4r2jvvfx3qng9mzaybi01dyi1dbnpm-nixos-system-ghaf-host-26.05.20251216.f5588cc/init audit_backlog_limit=8192 usbcore.quirks=2357:0601:k,0bda:8153:k console=tty0 console=>
                "linux" : "/EFI/Linux/ghaf-25.12.1-1bd56be40dd0e66e.efi",
                "isReported" : true,
                "isDefault" : true,
                "isSelected" : true,
                "addons" : null,
                "cmdline" : "init=/nix/store/ah4r2jvvfx3qng9mzaybi01dyi1dbnpm-nixos-system-ghaf-host-26.05.20251216.f5588cc/init audit_backlog_limit=8192 usbcore.quirks=2357:0601:k,0bda:8153:k console=tty0 console=>
        },
        {
                "type" : "type1",
                "source" : "esp",
                "id" : "nixos-generation-1.conf",
                "path" : "/boot/loader/entries/nixos-generation-1.conf",
                "root" : "/boot",
                "title" : "NixOS",
                "showTitle" : "NixOS",
                "sortKey" : "nixos",
                "version" : "Generation 1 NixOS Yarara 26.05.20251216.f5588cc (Linux 6.18.1), built on 2026-01-21",
                "options" : "init=/nix/store/aqnfb5s8m11420m045wm9spzw9hk6dnk-nixos-system-ghaf-host-26.05.20251216.f5588cc/init quiet udev.log_priority=3 bgrt_disable=1 plymouth.use-simpledrm usbcore.quirks=2357:0>
                "linux" : "/EFI/nixos/hkh2mqf80lfaagglaf04wm9ljgdz2lm3-linux-6.18.1-bzImage.efi",
                "initrd" : [
                        "/EFI/nixos/99g40r2a838g1qs5sbr1kcjkb11l9id5-initrd-linux-6.18.1-initrd.efi"
                ],
                "isReported" : true,
                "isDefault" : false,
                "isSelected" : false,
                "addons" : null,
                "cmdline" : "init=/nix/store/aqnfb5s8m11420m045wm9spzw9hk6dnk-nixos-system-ghaf-host-26.05.20251216.f5588cc/init quiet udev.log_priority=3 bgrt_disable=1 plymouth.use-simpledrm usbcore.quirks=2357:0>
        },
        {
                "type" : "auto",
                "source" : "esp",
                "id" : "auto-reboot-to-firmware-setup",
                "path" : "/sys/firmware/efi/efivars/LoaderEntries-4a67b082-0a4c-41cf-b6c7-440b29bb8c4f",
                "title" : "Reboot Into Firmware Interface",
                "showTitle" : "Reboot Into Firmware Interface",
                "isReported" : true,
                "isDefault" : false,
                "isSelected" : false,
                "addons" : null
        }
]
[ghaf@ghaf-host:~]$ sudo -E LC_ALL=C  lvs --all --nameprefixes --noheadings
  LVM2_LV_NAME='persist' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-ao----' LVM2_LV_SIZE='<829.38g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''
  LVM2_LV_NAME='root_0' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-a-----' LVM2_LV_SIZE='50.00g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''
  LVM2_LV_NAME='root_25.12.1_1bd56be40dd0e66e' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-ao----' LVM2_LV_SIZE='50.00g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''
  LVM2_LV_NAME='swap' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-ao----' LVM2_LV_SIZE='12.00g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''
  LVM2_LV_NAME='verity_0' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-a-----' LVM2_LV_SIZE='6.00g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''
  LVM2_LV_NAME='verity_25.12.1_1bd56be40dd0e66e' LVM2_VG_NAME='pool' LVM2_LV_ATTR='-wi-ao----' LVM2_LV_SIZE='6.00g' LVM2_POOL_LV='' LVM2_ORIGIN='' LVM2_DATA_PERCENT='' LVM2_METADATA_PERCENT='' LVM2_MOVE_PV='' LVM2_MIRROR_LOG='' LVM2_COPY_PERCENT='' LVM2_CONVERT_LV=''

@leivos-unikie
Copy link
Contributor

Note: the branch is 22 commits behind main

@leivos-unikie
Copy link
Contributor

After reboot and selecting plain NixOS at the boot menu it boots fine to the original version and

[ghaf@ghaf-host:~]$ sudo ota-update image status
Slot groups:
- slot: used
  version: 0
  legacy: true
  active: true
  root: pool/root_0 (50.0G)
  verity: pool/verity_0 (6.0G)
  uki: <none>

- slot: used
  version: 25.12.1 (hash=1bd56be40dd0e66e)
  legacy: false
  active: false
  root: pool/root_25.12.1_1bd56be40dd0e66e (50.0G)
  verity: pool/verity_25.12.1_1bd56be40dd0e66e (6.0G)
  uki: /boot/ghaf-25.12.1-1bd56be40dd0e66e.efi

Unrecognized volumes:
- pool/swap (12.0G)

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Feb 5, 2026

Tested again on lenovo-x1.

Summary

  • Update seems to happen as it should, versions etc shown in sudo ota-update image status are changed correctly
  • Also removing old ghaf version and updating to 26.01.1.0 version no works as instructed
  • There is just still the same problem that update bricks the ghaf OS:
    -- Boot after update becomes extraordinary slow > 140s
    -- All appvms fail to boot
    -- local user is lost and Ghaf User Provisioning menu does not provide any option to create local user

Also I am wondering if ids-vm is included purposely in the update? That is normally disabled in ghaf by default.

@leivos-unikie
Copy link
Contributor

More detailed notes of this test run
AB_update_testing_2026-02-05.txt

@leivos-unikie
Copy link
Contributor

Checked also with encrypted installation (sudo ghaf-installer -e) that the update itself works. This time I didn't create user at all after booting to ghaf after installation, proceeded to update directly. The resulting ghaf after update is again broken, same symptoms as before.

# Ghaf Inter VM communication and control library
givc = {
url = "github:tiiuae/ghaf-givc";
url = "github:avnik/ghaf-givc?ref=avnik/ab-update";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge ghaf-givc dependencies first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That a plan. It 99% ready

Comment on lines +70 to +72
sed -i \
"0,/${roothashPlaceholder}/ s/${roothashPlaceholder}/$verityRoothash/" \
${kernelImage}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sed -i on .raw image can be inefficient because it creates a full temporary copy of the file.
Recommendation: Use a Python script with mmap or a dd-based approach to perform an in-place replacement of the 64-character hash string

Copy link
Collaborator

@Mic92 Mic92 Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this statement without a benchmark. The kernel image isn't that big to begin with and starting python also has a cost and is in the order of 40x slower than native code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Python isn’t an option, we can explore using the dd command. If the file size is not that big, using sed is also fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in reality it pretty fast.
But I didn't like this approach anyway, I'd prefer switch to direct call of ukify/sbsing into.

cp ${config.system.build.uki}/${config.system.boot.loader.ukiFile} ${kernelImage}

# Replace the placeholder with the real roothash in the target .raw file
verityRoothash=$(cat $out/dm-verity-root-hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to verify whether verityRoothash can be empty?

Comment on lines +143 to +145
verity_0 = {
size = "6G";
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A standard dm-verity hash tree size allocation is generally small, typically requiring approximately 0.8% to 1% of the total size of the protected partition.
Protecting a 10GB partition often requires only about 81MB of additional space for the hash tree.

Any specific need to have size of 6G?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is quiet large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My readings/measurements show 8-10%

Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
avnik and others added 14 commits February 16, 2026 15:03
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Host Configuration: Added an entry to systemd.tmpfiles.rules in modules/microvm/host/microvm-host.nix to ensure the /persist/sysupdate directory is created on the host with 0755 permissions owned by root.

modules/microvm/host/microvm-host.nix

"d /persist/sysupdate 0755 root root -"
NetVM Configuration: Added a share configuration in modules/microvm/sysvms/netvm.nix to mount the host's /persist/sysupdate to /persist/sysupdate inside the netvm using virtiofs.

modules/microvm/sysvms/netvm.nix

{
  tag = "sysupdate";
  source = "/persist/sysupdate";
  mountPoint = "/persist/sysupdate";
  proto = "virtiofs";
}

Signed-off-by: vadik likholetov <vadikas@gmail.com>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
…jection

Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
import os


def fixname(filename, version, fragment):
Copy link
Collaborator

@Mic92 Mic92 Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding types here to all functions, would be useful for future refactorings.

def sha256_file(path: str) -> str:
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(1024 * 1024), b""):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick For multiple gigabyte files it might be worth using sha256sum instead of python. But we should probably measure the time of creating the manifest quick to see if this optimization is worth wile.

@leivos-unikie
Copy link
Contributor

Tested again on Lenovo-X1

  • Sysupdate target name had changed a bit. So I did building updates like this:
    nix build -L ".#lenovo-x1-carbon-gen11-debug-sysupdate" --show-trace

  • Now user is not lost over update, login works after update. Didn't notice differences in ghaf behavior after update.

  • Checked that encrypted installation didn't make any difference, update worked fine with that too.

  • audit-rules service fails on ghaf-host after update & reboot
    [ghaf@ghaf-host:~]$ systemctl list-units --state=failed
    UNIT LOAD ACTIVE SUB DESCRIPTION
    ● audit-rules-nixos.service loaded failed failed Load Audit Rules

@leivos-unikie
Copy link
Contributor

This PR caused build of multiple 128GB images in automated pre-merge testing, prod agent got stuck with disk full yesterday because of this.

@Mic92
Copy link
Collaborator

Mic92 commented Feb 23, 2026

This PR caused build of multiple 128GB images in automated pre-merge testing, prod agent got stuck with disk full yesterday because of this.

Maybe we need to switch to runInLinuxVM? One option that came to my mind, is using qcow2 or some other sparse format. We also don't need to create so large partitions upfront.

Signed-off-by: Alexander Nikolaev <alexander.nikolaev@tii.ae>
@Mic92
Copy link
Collaborator

Mic92 commented Feb 24, 2026

I now create the b partitions / swap / persist on first boot: Mic92@cf31f6a

this saves a lot of memory. Also the root partition is now compressed with lz4: Mic92@cabfbbb
so in theory, we wouldn't really need the zstd compression anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants