-
Notifications
You must be signed in to change notification settings - Fork 200
Rollback Fixes #2213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Johan-Liebert1
wants to merge
5
commits into
bootc-dev:main
Choose a base branch
from
Johan-Liebert1:rollback-fix
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Rollback Fixes #2213
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
adbe9ff
cfs/rollback: Remove staged entry on rollback
Johan-Liebert1 523464a
tests: Update rollback test
Johan-Liebert1 8c029e7
cfs/status: Implement bootloader-specific sorting
Johan-Liebert1 585b3eb
cfs/rollback: Update the way we get rollback
Johan-Liebert1 830e90a
composefs: Make operations resilient to corrupted state
Johan-Liebert1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robustness & Correctness Issue
loader/entries.staged(TYPE1_ENT_PATH_STAGED) already exists on disk (e.g., from a previously interrupted update or rollback) buthost.status.stagedisNone, we do not clean it up before writing the new rollback BLS configs. This can cause stale entries to be carried over and swapped into the activeloader/entriesdirectory duringrename_exchange_bls_entries.COMPOSEFS_STAGED_DEPLOYMENT_FNAMEis missing or already deleted,remove_filewill fail withNotFound, unnecessarily failing the entire rollback operation.Recommendation
Always clean up
TYPE1_ENT_PATH_STAGEDif it exists usingremove_all_optional, and ignoreNotFounderrors when deleting the staged deployment file.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would definitely want to return an error if we have a staged deployment and this doesn't exist. Basically means the system is in a weird state. It is the same with
COMPOSEFS_STAGED_DEPLOYMENT_FNAME,host.stagedis only set ifCOMPOSEFS_STAGED_DEPLOYMENT_FNAMEexists. Also, in the staged op, writingCOMPOSEFS_STAGED_DEPLOYMENT_FNAMEis the very last operation we doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm a few principles here. First off we should be able to recover from most unexpected state ideally.
It'd actually be a good exercise (LLM assisted even) to look at just deleting files in the storage or in
/runstate and ensure that at leastbootc switchis able to recover.I didn't dig into this but on the staged state bits ensuring we e.g. log to the journal and continue seems like a good idea instead of just using
?.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I think Gemini's suggestion makes sense. We could've failed writing the file in
/runwhich would leavehost.stagedasNone, butentries.stagedwill still exist, re-triggering this bug.I believe this is true as of now even. I'll incorporate this scenario into one of the tests
sounds good. I'll update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, I've realized
COMPOSEFS_STAGED_DEPLOYMENT_FNAMEdoesn't serve much purpose except convenience. We could figure out the staged deployment by checking theentries.stageddir and sorting the entries. I think we can remove this. @cgwalters what do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't remove this as we also use this to check if the deployment is in
download_onlymodeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in latest commit