Fix UEFI RootfsStatusSlot desync that blocks OTA updates after rollback#41
Draft
Fix UEFI RootfsStatusSlot desync that blocks OTA updates after rollback#41
Conversation
If commit is not run after the first reboot, that's exactly how this happens. Mender's rollback on Jetson relies on UEFI exhausting retry_count. Each uncommitted reboot decrements it. When it hits 0, UEFI writes 0xFF permanently. The new ArtifactCommit_Leave script breaks this cycle by cleaning up 0xFF after every successful update. On Jetson UEFI (L4T R36.x), when a Mender update rolls back, the UEFI retry_count mechanism (2-bit, range 0-3) eventually marks the target rootfs slot as permanently unbootable (RootfsStatusSlot = 0xFF) Since nvbootctrl mark-boot-successful was removed in L4T 35.2.1, nothing resets this flag -- the slot stays permanently trapped and all future OTA updates targeting it silently fail. Two-layer fix: - Layer 1 (ArtifactInstall_Leave): reset target slot's RootfsStatusSlot to 0x00 before switching, so the current update can boot even if the slot was previously marked unbootable. UEFI retry_count still provides fallback protection for genuinely bad rootfs. - Layer 2 (ArtifactCommit_Leave): reset inactive slot's RootfsStatusSlot to 0x00 after commit, preparing it for the next update cycle. Also includes: - Manual fix script (scripts/fix-rootfs-slot-status.sh) for diagnosing and repairing devices already in this state Signed-off-by: Adrian Bularca <adrian.bularca@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If the Mender
commitcommand is not run after the first reboot, that's exactly how this happens. Mender's rollback on Jetson relies on UEFI exhausting retry_count. Each uncommitted reboot decrements it. When it hits 0, UEFI writes 0xFF permanently. The new ArtifactCommit_Leave script breaks this cycle by cleaning up 0xFF after every successful update.On Jetson UEFI (L4T R36.x), when a Mender update rolls back, the UEFI retry_count mechanism (2-bit, range 0-3) eventually marks the target rootfs slot as permanently unbootable (RootfsStatusSlot = 0xFF) Since nvbootctrl mark-boot-successful was removed in L4T 35.2.1, nothing resets this flag -- the slot stays permanently trapped and all future OTA updates targeting it silently fail.
Two-layer fix:
Also includes: