-
Notifications
You must be signed in to change notification settings - Fork 153
Initial factory reset code #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
387f495 to
ab75caf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new experimental bootc install reset command, which appears to function as a factory reset. The implementation is well-structured, refactoring existing deployment logic to use a new MergeState enum, which is a clean way to handle different deployment sources. The changes also include new utilities for managing stateroots. I've found a couple of issues that need to be addressed before merging.
|
I think we'll want tests at least before merging, and supporting that with tmt is probably going to be an excellent driver for the feature. I haven't dug in but I think tmt is relying on injecting ssh keys w/cloud-init, so it might Just Work to do a reset there? |
b45e272 to
a0cd215
Compare
Updated with a basic test. |
a0cd215 to
2ad17c6
Compare
crates/lib/src/deploy.rs
Outdated
| let deployment_dir = ostree.deployment_dirpath(&staged); | ||
| let deployment_dir = std::path::Path::new(deployment_dir.as_str()); | ||
| if deployment_dir.exists() { | ||
| let marker_path = deployment_dir.join(".bootc-factory-reset"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have clear const for things like this with an explanation there.
But backing up a second, in the ostree side we have "deployment backing" directories already where we can put metadata.
Though of course we should also be thinking about how this would work with the composefs backend.
Hmmm hmm. Here's an idea, how about we use the deployment's etc directory for state like this instead? I think we could save e.g. system.bootc.merged_from xattr or so?
Basically I lean towards extended attributes over "stamp files" for metadata.
BTW though do note the intersection here with etc.transient - I wonder if we should make that more visible in status output too. If that's enabled each upgrade is already like a reset of /etc.
crates/lib/src/status.rs
Outdated
| if is_factory_reset { | ||
| // Factory reset deployments don't support soft reboots | ||
| // this is primarily because the kargs validation will fail when checking for soft reboot | ||
| // compatibility in the ostree code, which will cause bootc status and upgrade to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we need to special case this. Why is bootc status failing exactly?
If we're changing (usually dropping) kernel arguments won't the existing karg check work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup - we talked about this in a live chat and there seems to be a libostree bug here where the staged deployment is missing ostree= and that trips an assertion in the soft reboot checks.
Ideally we root cause and fix that.
But we may be able to work around this by detecting when there's no kargs in the staged deployment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fix the factory resest issue: ostreedev/ostree#3532. Still investigating why the deployment is missing the ostree arg.
|
Just saw the demo for this. A few random thoughts. It'd be nice to be able to have different reset levels. One is just resetting Should this use stateroots at all? This would be another thing to implement in the composefs backend (or maybe that code and concept already exist there?). But also it's quite common for One alternative though is basically to just move everything in (There's still the case of submounts in |
2ad17c6 to
6003aa7
Compare
46e0668 to
def648d
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's some CI failures, but otherwise we could try to get this in?
It'd be a bit ugly but maybe we could detect the ostree version and avoid crashing if it's too old?
def648d to
ea5896f
Compare
Just pushed up a commit to check the ostree version in the has_soft_reboot_capability function where the crash happens. I haven't had a chance to dig into the deeper root cause of why the ostree= karg is getting set but the version check should unblock this PR's CI. The result is on systems with an outdated ostree, soft reboot will be disabled for factory reset deployments. If it works for you, I'll just file an issue to track fixing the ostree= karg issue. |
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall sane to me!
One thing that would probably help is to add docs for this in our experimental section.
| assert not equal $reset_status.status.otherDeployments.0.ostree.stateroot "default" | ||
|
|
||
| # we need tmt in the new stateroot for second_boot | ||
| print "Copying tmt into new stateroot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this one is really asking for enhancing reset to support copying data in a nicer way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on adding bootc mount
|
Ah, so the current CI failures are due to the outdated ostree in the tmt env which is causing the soft reboot tests to fail because softRebootCapable is defaulting to false. Not sure how we want to handle this. With the current code in this PR, soft reboot will be disabled for all systems with ostree < 2025.7. I can add back the code to the Dockerfile to install ostree from the continuous repo so the tests pass. Or, a more complicated workaround would be to add back the code to flag a deployment as a factory reset, then disable soft reboot only for ostree < 2025.7 and isFactoryReset. @cgwalters thoughts? |
|
Oh hmmmm. Yes a bit of a pickle. Can you see if (perhaps with agent assist) we can detect if the staged deployment is missing the or slightly more elaborated: |
7567a20 to
68aec61
Compare
Signed-off-by: Colin Walters <[email protected]>
This is a nondestructive variant of `to-existing-root`. Signed-off-by: Colin Walters <[email protected]>
Also cleans up some bugs from rebasing previous commits. Signed-off-by: ckyrouac <[email protected]>
9f1436d to
cc1b380
Compare
Signed-off-by: ckyrouac <[email protected]>
Add ostree version check to has_soft_reboot_capability() to ensure soft reboot is disabled when ostree < 2025.7 and ostree= karg is missing. This prevents attempting soft reboots on older ostree versions that have a bug when validating kargs during a factory reset. Signed-off-by: ckyrouac <[email protected]>
cc1b380 to
f6b1249
Compare
Add logic to copy the /boot mount specification from the existing /etc/fstab to the newly created stateroot during factory reset. This ensures that the /boot partition configuration is preserved across the reset operation. Includes helper function read_boot_fstab_entry() to parse /etc/fstab and locate the /boot mount entry, along with comprehensive unit tests. Assisted-by: Claude Code Signed-off-by: ckyrouac <[email protected]>
|
@cgwalters added another commit to automatically preserve the /boot mount |
|
Nice! Hopefully we get some good feedback from this one. |
This is a rebased version of #1389 with some fixes on top to make
bootc install reset --experimentalwork with the latest soft reboot code. I'd like to get this in hidden behind--experimental, then iterate on it. I plan to add some e2e tests as a followup but I can work on including them as part of this PR if you want.