Skip to content

Conversation

@cgwalters
Copy link
Collaborator

@cgwalters cgwalters commented Nov 19, 2025

See individual patches.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two valuable cleanups. First, it refactors the tmt test execution to directly discover and run tests instead of plans, which simplifies the configuration and removes duplication. This is a great improvement for maintainability. Second, it adds a command to show the tmt report on failure, which will significantly aid in debugging failing tests. I've noted a few minor inconsistencies in the updated logging and error messages where 'plan' was not fully replaced with 'test'. Please see my specific comments for details.

@henrywang
Copy link
Collaborator

The failure comes from test-26-examples-build in aarch64. This test has not been included into tmt plan before, so it only runs on github runner, but not on packit. I'm working on it to fix the issue.

@henrywang henrywang force-pushed the tmt-discover branch 2 times, most recently from 9b18985 to 03ff8a8 Compare November 20, 2025 14:42
@jeckersb
Copy link
Collaborator

xtask: Run tmt report on failure

Why tmt doesn't do this by default, I have no idea. It's also baffling that it takes three levels of verbose i.e. -vvv to get the actual test error.

This tmt behavior also drives me crazy. Glad to see this change 👍

--json pretty \
--output "/boot/$kver.efi"
# To support multi arch
systemd_boot_file=$(ls /usr/lib/systemd/boot/efi/*.efi | xargs -n 1 basename)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right though we're going to need the same fix in Dockerfile.cfsuki. Which I think again is a driving use case for having a higher level wrapper for this.

@cgwalters
Copy link
Collaborator Author

Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs.

@henrywang
Copy link
Collaborator

henrywang commented Nov 20, 2025

Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs.

Already worked on it. Not figure out the root cause in TF. Will debug it in pure TF to see what happens.

@cgwalters
Copy link
Collaborator Author

Wait a minute...does testing-farm+tmt run all tests from a single plan on the same provisioned environment? It's sure looking like that's the case, see e.g. this log from the run here.

It's quite impressive actually that it's just that test that fails because some of these tests are going to leak state.

Yeah this seems to be the case. OK so this PR needs a rework.

@cgwalters
Copy link
Collaborator Author

because some of these tests are going to leak state.

(Of course, we could optimize this if we relied on a factory reset; I'm sure we could do that for most tests)

@cgwalters cgwalters assigned cgwalters and unassigned henrywang Nov 21, 2025
@cgwalters
Copy link
Collaborator Author

I've come to the reluctant conclusion there's no sane option to fix the "triple test definition" problem we have other than going to generating the tmt stuff. Working on that.

@cgwalters cgwalters marked this pull request as draft November 21, 2025 14:25
@github-actions github-actions bot added the area/documentation Updates to the documentation label Nov 21, 2025
@cgwalters
Copy link
Collaborator Author

One thing I will say is I am so glad that AI exists to learn things like fmf for me

Right now touching e.g. xtask.rs causes a build phase when
it shouldn't, this helps bypassing that.

Signed-off-by: Colin Walters <[email protected]>
We need to run most of our tests in a separate provisioned machine,
which means it needs an individual plan. And then we need a test
for that plan. And then we need the *actual test code*.

This "triplication" is a huge annoying pain.

TMT is soooo complicated, yet as far as I can tell it doesn't offer
us any tools to solve this. So we'll do it here, cut over to
generating the TMT stuff from metadata defined in the test file.

Hence adding a test is just:

- Write a new tests/booted/foo.nu
- `cargo xtask update-generated`

Signed-off-by: Colin Walters <[email protected]>
…tion

Move TMT test runner code from xtask.rs to tmt module:
- `run_tmt()` and `tmt_provision()` functions
- Helper functions for VM management and SSH connectivity
- Related constants

Also refactor `update_integration()` to use serde_yaml::Value for
building YAML structures instead of string concatenation.

Add detailed error reporting for failed TMT tests:
- Assign run IDs using `tmt run --id`
- Display verbose reports with `tmt run -i {id} report -vvv`

Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Colin Walters <[email protected]>
Otherwise we compile many dependencies twice unnecessarily.

Signed-off-by: Colin Walters <[email protected]>
To make it easier to do upgrade tests.

Signed-off-by: Colin Walters <[email protected]>
This ensures it all can work much more elegantly/naturally
with sealed UKI builds - we don't want to do the build-on-target
thing.

Signed-off-by: Colin Walters <[email protected]>
@cgwalters cgwalters changed the title Two tmt cleanups Various tmt + testing improvements Nov 21, 2025
@cgwalters cgwalters marked this pull request as ready for review November 21, 2025 20:43
@cgwalters
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the testing infrastructure, primarily by refactoring the tmt test execution into a dedicated xtask module. The changes make the test setup more robust and extensible, for example by adding support for synthetic upgrade tests.

My review focuses on a few key areas:

  • A critical issue in the newly added tmt.rs file where an outdated rand crate API is used, which will prevent compilation.
  • A couple of medium-severity suggestions to improve maintainability and clarity in the new test scripts and xtask code. One is about an inconsistency in a Dockerfile, and the other suggests refactoring a large loop for better readability and error handling.

Overall, the changes are a great step forward for the project's testing capabilities. Addressing the identified issues will make the new testing code even better.

Comment on lines +28 to +37
fn generate_random_suffix() -> String {
let mut rng = rand::rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.random_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The rand::rng() function and random_range method are from an older version of the rand crate. With rand version 0.9+, you should use rand::thread_rng() to get a thread-local random number generator, and the gen_range() method to generate a value in a given range. This change is necessary for the code to compile and aligns with current best practices for the rand crate.

Suggested change
fn generate_random_suffix() -> String {
let mut rng = rand::rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.random_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}
fn generate_random_suffix() -> String {
let mut rng = rand::thread_rng();
const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789";
(0..8)
.map(|_| {
let idx = rng.gen_range(0..CHARSET.len());
CHARSET[idx] as char
})
.collect()
}

Comment on lines +220 to +396
for plan in plans {
let plan_name = sanitize_plan_name(plan);
let vm_name = format!("bootc-tmt-{}-{}", random_suffix, plan_name);

println!("\n========================================");
println!("Running plan: {}", plan);
println!("VM name: {}", vm_name);
println!("========================================\n");

// Launch VM with bcvk

let launch_result = cmd!(
sh,
"bcvk libvirt run --name {vm_name} --detach {COMMON_INST_ARGS...} {image}"
)
.run()
.context("Launching VM with bcvk");

if let Err(e) = launch_result {
eprintln!("Failed to launch VM for plan {}: {:#}", plan, e);
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

// Ensure VM cleanup happens even on error (unless --preserve-vm is set)
let cleanup_vm = || {
if preserve_vm {
return;
}
if let Err(e) = cmd!(sh, "bcvk libvirt rm --stop --force {vm_name}")
.ignore_stderr()
.ignore_status()
.run()
{
eprintln!("Warning: Failed to cleanup VM {}: {}", vm_name, e);
}
};

// Wait for VM to be ready and get SSH info
let vm_info = wait_for_vm_ready(sh, &vm_name);
let (ssh_port, ssh_key) = match vm_info {
Ok((port, key)) => (port, key),
Err(e) => {
eprintln!("Failed to get VM info for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

println!("VM ready, SSH port: {}", ssh_port);

// Save SSH private key to a temporary file
let key_file = tempfile::NamedTempFile::new().context("Creating temporary SSH key file");

let key_file = match key_file {
Ok(f) => f,
Err(e) => {
eprintln!("Failed to create SSH key file for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf())
.context("Converting key path to UTF-8");

let key_path = match key_path {
Ok(p) => p,
Err(e) => {
eprintln!("Failed to convert key path for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
};

if let Err(e) = std::fs::write(&key_path, ssh_key) {
eprintln!("Failed to write SSH key for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

// Set proper permissions on the key file (SSH requires 0600)
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
if let Err(e) = std::fs::set_permissions(&key_path, perms) {
eprintln!("Failed to set key permissions for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}
}

// Verify SSH connectivity
println!("Verifying SSH connectivity...");
if let Err(e) = verify_ssh_connectivity(sh, ssh_port, &key_path) {
eprintln!("SSH verification failed for plan {}: {:#}", plan, e);
cleanup_vm();
all_passed = false;
test_results.push((plan.to_string(), false, None));
continue;
}

println!("SSH connectivity verified");

let ssh_port_str = ssh_port.to_string();

// Run tmt for this specific plan using connect provisioner
println!("Running tmt tests for plan {}...", plan);

// Generate a unique run ID for this test
// Use the VM name which already contains a random suffix for uniqueness
let run_id = vm_name.clone();

// Run tmt for this specific plan
// Note: provision must come before plan for connect to work properly
let context = context.clone();
let how = ["--how=connect", "--guest=localhost", "--user=root"];
let env = ["TMT_SCRIPTS_DIR=/var/lib/tmt/scripts", "BCVK_EXPORT=1"]
.into_iter()
.chain(args.env.iter().map(|v| v.as_str()))
.flat_map(|v| ["--environment", v]);
let test_result = cmd!(
sh,
"tmt {context...} run --id {run_id} --all {env...} provision {how...} --port {ssh_port_str} --key {key_path} plan --name {plan}"
)
.run();

// Clean up VM regardless of test result (unless --preserve-vm is set)
cleanup_vm();

match test_result {
Ok(_) => {
println!("Plan {} completed successfully", plan);
test_results.push((plan.to_string(), true, Some(run_id)));
}
Err(e) => {
eprintln!("Plan {} failed: {:#}", plan, e);
all_passed = false;
test_results.push((plan.to_string(), false, Some(run_id)));
}
}

// Print VM connection details if preserving
if preserve_vm {
// Copy SSH key to a persistent location
let persistent_key_path = Utf8Path::new("target").join(format!("{}.ssh-key", vm_name));
if let Err(e) = std::fs::copy(&key_path, &persistent_key_path) {
eprintln!("Warning: Failed to save persistent SSH key: {}", e);
} else {
println!("\n========================================");
println!("VM preserved for debugging:");
println!("========================================");
println!("VM name: {}", vm_name);
println!("SSH port: {}", ssh_port_str);
println!("SSH key: {}", persistent_key_path);
println!("\nTo connect via SSH:");
println!(
" ssh -i {} -p {} -o IdentitiesOnly=yes root@localhost",
persistent_key_path, ssh_port_str
);
println!("\nTo cleanup:");
println!(" bcvk libvirt rm --stop --force {}", vm_name);
println!("========================================\n");
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic inside this for loop is quite long and contains repetitive error handling boilerplate. This makes it hard to follow the main flow of setting up and running a test plan.

Consider refactoring the body of this loop into a separate function, for example run_plan_in_vm(sh: &Shell, plan: &str, ...) -> Result<String>.

This would allow you to:

  1. Use the ? operator for cleaner error propagation within the new function.
  2. Manage VM cleanup automatically using a guard struct that calls cleanup_vm() on Drop. This is a common pattern in Rust for resource management (RAII).

An example of a guard struct:

struct VmGuard<'a, 'b> {
    sh: &'a Shell,
    vm_name: &'b str,
    preserve: bool,
}

impl<'a, 'b> Drop for VmGuard<'a, 'b> {
    fn drop(&mut self) {
        if self.preserve {
            return;
        }
        if let Err(e) = cmd!(self.sh, "bcvk libvirt rm --stop --force {vm_name}", vm_name = self.vm_name)
            .ignore_stderr()
            .ignore_status()
            .run()
        {
            eprintln!("Warning: Failed to cleanup VM {}: {}", self.vm_name, e);
        }
    }
}

The main loop would then become much simpler, focusing on iterating through plans and reporting results.

@@ -0,0 +1,3 @@
# Just creates a file as a new layer for a synthetic upgrade test
FROM localhost/bootc-integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Justfile's _build-upgrade-image rule overrides this FROM instruction with localhost/bootc-integration-bin using podman build --from. For clarity and to avoid confusion, this FROM instruction should match what is used in the Justfile, or use a generic placeholder like scratch to indicate it's always overridden.

FROM localhost/bootc-integration-bin

@cgwalters
Copy link
Collaborator Author

    content: /bin/sh: error while loading shared libraries: /lib64/libc.so.6: cannot apply additional memory protection after relocation: Permission denied

Hmm, I saw that once locally but it went away and I didn't debug. Dang it...I think this is a complicated side effect of us exporting the host container storage via virtiofs - basically the host selinux policy is tripping over the virtiofs_t type. Hum...I have an idea

@cgwalters
Copy link
Collaborator Author

Hum...I have an idea

➡️ bootc-dev/bcvk#159

Haven't reproduced the bug yet locally though but it at least doesn't hurt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Updates to the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants