-
Notifications
You must be signed in to change notification settings - Fork 150
Various tmt + testing improvements #1790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a5d9732 to
df71cea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two valuable cleanups. First, it refactors the tmt test execution to directly discover and run tests instead of plans, which simplifies the configuration and removes duplication. This is a great improvement for maintainability. Second, it adds a command to show the tmt report on failure, which will significantly aid in debugging failing tests. I've noted a few minor inconsistencies in the updated logging and error messages where 'plan' was not fully replaced with 'test'. Please see my specific comments for details.
|
The failure comes from |
9b18985 to
03ff8a8
Compare
This tmt behavior also drives me crazy. Glad to see this change 👍 |
| --json pretty \ | ||
| --output "/boot/$kver.efi" | ||
| # To support multi arch | ||
| systemd_boot_file=$(ls /usr/lib/systemd/boot/efi/*.efi | xargs -n 1 basename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right though we're going to need the same fix in Dockerfile.cfsuki. Which I think again is a driving use case for having a higher level wrapper for this.
|
Strange, I don't understand why the soft reboot test is failing with testing-farm here but works in previous runs. |
Already worked on it. Not figure out the root cause in TF. Will debug it in pure TF to see what happens. |
|
Wait a minute...does testing-farm+tmt run all tests from a single plan on the same provisioned environment? It's sure looking like that's the case, see e.g. this log from the run here. It's quite impressive actually that it's just that test that fails because some of these tests are going to leak state. Yeah this seems to be the case. OK so this PR needs a rework. |
(Of course, we could optimize this if we relied on a factory reset; I'm sure we could do that for most tests) |
|
I've come to the reluctant conclusion there's no sane option to fix the "triple test definition" problem we have other than going to generating the tmt stuff. Working on that. |
03ff8a8 to
9652e72
Compare
9652e72 to
5aab4f5
Compare
|
One thing I will say is I am so glad that AI exists to learn things like fmf for me |
baf1550 to
95c8140
Compare
Right now touching e.g. xtask.rs causes a build phase when it shouldn't, this helps bypassing that. Signed-off-by: Colin Walters <[email protected]>
We need to run most of our tests in a separate provisioned machine, which means it needs an individual plan. And then we need a test for that plan. And then we need the *actual test code*. This "triplication" is a huge annoying pain. TMT is soooo complicated, yet as far as I can tell it doesn't offer us any tools to solve this. So we'll do it here, cut over to generating the TMT stuff from metadata defined in the test file. Hence adding a test is just: - Write a new tests/booted/foo.nu - `cargo xtask update-generated` Signed-off-by: Colin Walters <[email protected]>
…tion
Move TMT test runner code from xtask.rs to tmt module:
- `run_tmt()` and `tmt_provision()` functions
- Helper functions for VM management and SSH connectivity
- Related constants
Also refactor `update_integration()` to use serde_yaml::Value for
building YAML structures instead of string concatenation.
Add detailed error reporting for failed TMT tests:
- Assign run IDs using `tmt run --id`
- Display verbose reports with `tmt run -i {id} report -vvv`
Assisted-by: Claude Code (Sonnet 4.5)
Signed-off-by: Colin Walters <[email protected]>
Otherwise we compile many dependencies twice unnecessarily. Signed-off-by: Colin Walters <[email protected]>
To make it easier to do upgrade tests. Signed-off-by: Colin Walters <[email protected]>
This ensures it all can work much more elegantly/naturally with sealed UKI builds - we don't want to do the build-on-target thing. Signed-off-by: Colin Walters <[email protected]>
Signed-off-by: Colin Walters <[email protected]>
95c8140 to
ec438ac
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to the testing infrastructure, primarily by refactoring the tmt test execution into a dedicated xtask module. The changes make the test setup more robust and extensible, for example by adding support for synthetic upgrade tests.
My review focuses on a few key areas:
- A critical issue in the newly added
tmt.rsfile where an outdatedrandcrate API is used, which will prevent compilation. - A couple of medium-severity suggestions to improve maintainability and clarity in the new test scripts and
xtaskcode. One is about an inconsistency in a Dockerfile, and the other suggests refactoring a large loop for better readability and error handling.
Overall, the changes are a great step forward for the project's testing capabilities. Addressing the identified issues will make the new testing code even better.
| fn generate_random_suffix() -> String { | ||
| let mut rng = rand::rng(); | ||
| const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789"; | ||
| (0..8) | ||
| .map(|_| { | ||
| let idx = rng.random_range(0..CHARSET.len()); | ||
| CHARSET[idx] as char | ||
| }) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rand::rng() function and random_range method are from an older version of the rand crate. With rand version 0.9+, you should use rand::thread_rng() to get a thread-local random number generator, and the gen_range() method to generate a value in a given range. This change is necessary for the code to compile and aligns with current best practices for the rand crate.
| fn generate_random_suffix() -> String { | |
| let mut rng = rand::rng(); | |
| const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789"; | |
| (0..8) | |
| .map(|_| { | |
| let idx = rng.random_range(0..CHARSET.len()); | |
| CHARSET[idx] as char | |
| }) | |
| .collect() | |
| } | |
| fn generate_random_suffix() -> String { | |
| let mut rng = rand::thread_rng(); | |
| const CHARSET: &[u8] = b"abcdefghijklmnopqrstuvwxyz0123456789"; | |
| (0..8) | |
| .map(|_| { | |
| let idx = rng.gen_range(0..CHARSET.len()); | |
| CHARSET[idx] as char | |
| }) | |
| .collect() | |
| } |
| for plan in plans { | ||
| let plan_name = sanitize_plan_name(plan); | ||
| let vm_name = format!("bootc-tmt-{}-{}", random_suffix, plan_name); | ||
|
|
||
| println!("\n========================================"); | ||
| println!("Running plan: {}", plan); | ||
| println!("VM name: {}", vm_name); | ||
| println!("========================================\n"); | ||
|
|
||
| // Launch VM with bcvk | ||
|
|
||
| let launch_result = cmd!( | ||
| sh, | ||
| "bcvk libvirt run --name {vm_name} --detach {COMMON_INST_ARGS...} {image}" | ||
| ) | ||
| .run() | ||
| .context("Launching VM with bcvk"); | ||
|
|
||
| if let Err(e) = launch_result { | ||
| eprintln!("Failed to launch VM for plan {}: {:#}", plan, e); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| // Ensure VM cleanup happens even on error (unless --preserve-vm is set) | ||
| let cleanup_vm = || { | ||
| if preserve_vm { | ||
| return; | ||
| } | ||
| if let Err(e) = cmd!(sh, "bcvk libvirt rm --stop --force {vm_name}") | ||
| .ignore_stderr() | ||
| .ignore_status() | ||
| .run() | ||
| { | ||
| eprintln!("Warning: Failed to cleanup VM {}: {}", vm_name, e); | ||
| } | ||
| }; | ||
|
|
||
| // Wait for VM to be ready and get SSH info | ||
| let vm_info = wait_for_vm_ready(sh, &vm_name); | ||
| let (ssh_port, ssh_key) = match vm_info { | ||
| Ok((port, key)) => (port, key), | ||
| Err(e) => { | ||
| eprintln!("Failed to get VM info for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| println!("VM ready, SSH port: {}", ssh_port); | ||
|
|
||
| // Save SSH private key to a temporary file | ||
| let key_file = tempfile::NamedTempFile::new().context("Creating temporary SSH key file"); | ||
|
|
||
| let key_file = match key_file { | ||
| Ok(f) => f, | ||
| Err(e) => { | ||
| eprintln!("Failed to create SSH key file for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| let key_path = Utf8PathBuf::try_from(key_file.path().to_path_buf()) | ||
| .context("Converting key path to UTF-8"); | ||
|
|
||
| let key_path = match key_path { | ||
| Ok(p) => p, | ||
| Err(e) => { | ||
| eprintln!("Failed to convert key path for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| }; | ||
|
|
||
| if let Err(e) = std::fs::write(&key_path, ssh_key) { | ||
| eprintln!("Failed to write SSH key for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| // Set proper permissions on the key file (SSH requires 0600) | ||
| { | ||
| use std::os::unix::fs::PermissionsExt; | ||
| let perms = std::fs::Permissions::from_mode(0o600); | ||
| if let Err(e) = std::fs::set_permissions(&key_path, perms) { | ||
| eprintln!("Failed to set key permissions for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| // Verify SSH connectivity | ||
| println!("Verifying SSH connectivity..."); | ||
| if let Err(e) = verify_ssh_connectivity(sh, ssh_port, &key_path) { | ||
| eprintln!("SSH verification failed for plan {}: {:#}", plan, e); | ||
| cleanup_vm(); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, None)); | ||
| continue; | ||
| } | ||
|
|
||
| println!("SSH connectivity verified"); | ||
|
|
||
| let ssh_port_str = ssh_port.to_string(); | ||
|
|
||
| // Run tmt for this specific plan using connect provisioner | ||
| println!("Running tmt tests for plan {}...", plan); | ||
|
|
||
| // Generate a unique run ID for this test | ||
| // Use the VM name which already contains a random suffix for uniqueness | ||
| let run_id = vm_name.clone(); | ||
|
|
||
| // Run tmt for this specific plan | ||
| // Note: provision must come before plan for connect to work properly | ||
| let context = context.clone(); | ||
| let how = ["--how=connect", "--guest=localhost", "--user=root"]; | ||
| let env = ["TMT_SCRIPTS_DIR=/var/lib/tmt/scripts", "BCVK_EXPORT=1"] | ||
| .into_iter() | ||
| .chain(args.env.iter().map(|v| v.as_str())) | ||
| .flat_map(|v| ["--environment", v]); | ||
| let test_result = cmd!( | ||
| sh, | ||
| "tmt {context...} run --id {run_id} --all {env...} provision {how...} --port {ssh_port_str} --key {key_path} plan --name {plan}" | ||
| ) | ||
| .run(); | ||
|
|
||
| // Clean up VM regardless of test result (unless --preserve-vm is set) | ||
| cleanup_vm(); | ||
|
|
||
| match test_result { | ||
| Ok(_) => { | ||
| println!("Plan {} completed successfully", plan); | ||
| test_results.push((plan.to_string(), true, Some(run_id))); | ||
| } | ||
| Err(e) => { | ||
| eprintln!("Plan {} failed: {:#}", plan, e); | ||
| all_passed = false; | ||
| test_results.push((plan.to_string(), false, Some(run_id))); | ||
| } | ||
| } | ||
|
|
||
| // Print VM connection details if preserving | ||
| if preserve_vm { | ||
| // Copy SSH key to a persistent location | ||
| let persistent_key_path = Utf8Path::new("target").join(format!("{}.ssh-key", vm_name)); | ||
| if let Err(e) = std::fs::copy(&key_path, &persistent_key_path) { | ||
| eprintln!("Warning: Failed to save persistent SSH key: {}", e); | ||
| } else { | ||
| println!("\n========================================"); | ||
| println!("VM preserved for debugging:"); | ||
| println!("========================================"); | ||
| println!("VM name: {}", vm_name); | ||
| println!("SSH port: {}", ssh_port_str); | ||
| println!("SSH key: {}", persistent_key_path); | ||
| println!("\nTo connect via SSH:"); | ||
| println!( | ||
| " ssh -i {} -p {} -o IdentitiesOnly=yes root@localhost", | ||
| persistent_key_path, ssh_port_str | ||
| ); | ||
| println!("\nTo cleanup:"); | ||
| println!(" bcvk libvirt rm --stop --force {}", vm_name); | ||
| println!("========================================\n"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic inside this for loop is quite long and contains repetitive error handling boilerplate. This makes it hard to follow the main flow of setting up and running a test plan.
Consider refactoring the body of this loop into a separate function, for example run_plan_in_vm(sh: &Shell, plan: &str, ...) -> Result<String>.
This would allow you to:
- Use the
?operator for cleaner error propagation within the new function. - Manage VM cleanup automatically using a guard struct that calls
cleanup_vm()onDrop. This is a common pattern in Rust for resource management (RAII).
An example of a guard struct:
struct VmGuard<'a, 'b> {
sh: &'a Shell,
vm_name: &'b str,
preserve: bool,
}
impl<'a, 'b> Drop for VmGuard<'a, 'b> {
fn drop(&mut self) {
if self.preserve {
return;
}
if let Err(e) = cmd!(self.sh, "bcvk libvirt rm --stop --force {vm_name}", vm_name = self.vm_name)
.ignore_stderr()
.ignore_status()
.run()
{
eprintln!("Warning: Failed to cleanup VM {}: {}", self.vm_name, e);
}
}
}The main loop would then become much simpler, focusing on iterating through plans and reporting results.
| @@ -0,0 +1,3 @@ | |||
| # Just creates a file as a new layer for a synthetic upgrade test | |||
| FROM localhost/bootc-integration | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Justfile's _build-upgrade-image rule overrides this FROM instruction with localhost/bootc-integration-bin using podman build --from. For clarity and to avoid confusion, this FROM instruction should match what is used in the Justfile, or use a generic placeholder like scratch to indicate it's always overridden.
FROM localhost/bootc-integration-bin
Hmm, I saw that once locally but it went away and I didn't debug. Dang it...I think this is a complicated side effect of us exporting the host container storage via virtiofs - basically the host selinux policy is tripping over the |
Haven't reproduced the bug yet locally though but it at least doesn't hurt |
See individual patches.