Skip to content

Fix UEFI RootfsStatusSlot desync that blocks OTA updates after rollback#41

Draft
abularca wants to merge 1 commit intomainfrom
feature/fix-mender-update
Draft

Fix UEFI RootfsStatusSlot desync that blocks OTA updates after rollback#41
abularca wants to merge 1 commit intomainfrom
feature/fix-mender-update

Conversation

@abularca
Copy link
Copy Markdown
Collaborator

@abularca abularca commented Apr 1, 2026

If the Mender commit command is not run after the first reboot, that's exactly how this happens. Mender's rollback on Jetson relies on UEFI exhausting retry_count. Each uncommitted reboot decrements it. When it hits 0, UEFI writes 0xFF permanently. The new ArtifactCommit_Leave script breaks this cycle by cleaning up 0xFF after every successful update.

On Jetson UEFI (L4T R36.x), when a Mender update rolls back, the UEFI retry_count mechanism (2-bit, range 0-3) eventually marks the target rootfs slot as permanently unbootable (RootfsStatusSlot = 0xFF) Since nvbootctrl mark-boot-successful was removed in L4T 35.2.1, nothing resets this flag -- the slot stays permanently trapped and all future OTA updates targeting it silently fail.

Two-layer fix:

  • Layer 1 (ArtifactInstall_Leave): reset target slot's RootfsStatusSlot to 0x00 before switching, so the current update can boot even if the slot was previously marked unbootable. UEFI retry_count still provides fallback protection for genuinely bad rootfs.
  • Layer 2 (ArtifactCommit_Leave): reset inactive slot's RootfsStatusSlot to 0x00 after commit, preparing it for the next update cycle.

Also includes:

  • Manual fix script (scripts/fix-rootfs-slot-status.sh) for diagnosing and repairing devices already in this state

If commit is not run after the first reboot, that's exactly how this happens.
Mender's rollback on Jetson relies on UEFI exhausting retry_count.
Each uncommitted reboot decrements it. When it hits 0, UEFI writes
0xFF permanently. The new ArtifactCommit_Leave script breaks this
cycle by cleaning up 0xFF after every successful update.

On Jetson UEFI (L4T R36.x), when a Mender update rolls back, the UEFI
retry_count mechanism (2-bit, range 0-3) eventually marks the target
rootfs slot as permanently unbootable (RootfsStatusSlot = 0xFF)
Since nvbootctrl mark-boot-successful was removed in L4T 35.2.1, nothing
resets this flag -- the slot stays permanently trapped and all future
OTA updates targeting it silently fail.

Two-layer fix:
- Layer 1 (ArtifactInstall_Leave): reset target slot's RootfsStatusSlot
  to 0x00 before switching, so the current update can boot even if the
  slot was previously marked unbootable. UEFI retry_count still provides
  fallback protection for genuinely bad rootfs.
- Layer 2 (ArtifactCommit_Leave): reset inactive slot's RootfsStatusSlot
  to 0x00 after commit, preparing it for the next update cycle.

Also includes:
- Manual fix script (scripts/fix-rootfs-slot-status.sh) for diagnosing
  and repairing devices already in this state

Signed-off-by: Adrian Bularca <adrian.bularca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant