Skip to content

Conversation

@henrywang
Copy link
Collaborator

@henrywang henrywang commented Aug 7, 2025

New integration test workflow:

  • Run integration test in Github runner or run locally with TEST_OS=fedora-42 make test

  • build job:

  1. build bootc binary in container and replace the bootc inside the bootc image
  2. bootc install to-disk to generate disk image
  3. archive disk file (skip this step if run locally), one image can be used by the test jobs in parallel
  • test job:
  1. download disk image from archive
  2. boot disk image with qemu-system
  3. run tmt test against this VM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new integration test running on GitHub runners, which is a great addition. The implementation involves new scripts for building a test image and running it in QEMU. My review focuses on improving the robustness and security of these new test scripts. I've identified a few critical issues that would prevent the tests from running as intended, such as an undefined command and incorrect error handling in the presence of set -e. I've also pointed out security concerns regarding disabled TLS verification and opportunities to make the scripts more robust and maintainable. Please see the detailed comments for suggestions.


# This script runs disk image with qemu-system and run tmt against this vm.

BOOTC_TEMPDIR="/tmp/tmp-bootc-build"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The temporary directory path /tmp/tmp-bootc-build is hardcoded here and in build.sh. This creates a tight, implicit coupling between the two scripts. It would be more robust to define this path in one place (e.g., in the Makefile), export it as an environment variable, and have both scripts use it. This would also make it easier to manage the lifecycle of the temporary directory (creation and cleanup).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we could just require this script to be invoked in the context of the temporary directory.

That said also, the only thing that actually binds the two things together right now is the ssh key.

But we can avoid that by not using --root-ssh-authorized-keys at bootc install to-disk time but instead injecting the SSH key via systemd credentials - that's how podman-bootc does it.

(Of course this whole topic instantly gets into the whole https://gitlab.com/fedora/bootc/tracker/-/issues/2 and whether/how podman-bootc and other virt provisioning tools should be shared underneath different testing frameworks)

Anyways for now though let's just change the GHA to allocate a temporary directory, see below

@henrywang henrywang changed the title test: Add integration test running on github runner [WIP] test: Add integration test running on github runner Aug 7, 2025
@henrywang henrywang force-pushed the qemu_tmt branch 7 times, most recently from 1ce535b to a2def9a Compare August 8, 2025 07:13
@henrywang
Copy link
Collaborator Author

@cgwalters , The two failures still need investigate. It's tmt issue, not our code issue.
Do you have any comment on this PR? Thanks.

runs-on: ubuntu-latest

steps:
- name: Install podman for heredoc support
Copy link
Collaborator

@cgwalters cgwalters Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR specifically but we should probably factor this stuff out into a shared action helper

build:
strategy:
matrix:
test_os: [fedora-42]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this be quay.io/fedora/fedora:42 e.g. - see below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test_os will be [fedora-41, fedora-42, fedora-43, centos-9, centos-10].


# This script runs disk image with qemu-system and run tmt against this vm.

BOOTC_TEMPDIR="/tmp/tmp-bootc-build"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we could just require this script to be invoked in the context of the temporary directory.

That said also, the only thing that actually binds the two things together right now is the ssh key.

But we can avoid that by not using --root-ssh-authorized-keys at bootc install to-disk time but instead injecting the SSH key via systemd credentials - that's how podman-bootc does it.

(Of course this whole topic instantly gets into the whole https://gitlab.com/fedora/bootc/tracker/-/issues/2 and whether/how podman-bootc and other virt provisioning tools should be shared underneath different testing frameworks)

Anyways for now though let's just change the GHA to allocate a temporary directory, see below

cargo xtask test-tmt

test:
tests/build.sh && tests/test.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would then become:

tmpd=$(mktemp -d) && cd $tmpd && $srcdir/tests/build.sh && $srcdir/tests/test.sh

or so

@henrywang henrywang force-pushed the qemu_tmt branch 4 times, most recently from 33c2934 to 3067f33 Compare August 11, 2025 10:28
Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this as is if you prefer, we can land further cleanups as followups

@cgwalters
Copy link
Collaborator

     package: building container image with dependencies

We had a live chat about this and I think the fact that tmt is dynamically rebuilding the OS image on boot is a problem. In the long term ideally the testing framework doesn't require installing any binaries at all on the target (this would let us test minimal systems without python etc) but:

  • In the short term how about tmt print-target-dependencies or so and then we can pre-install those in our container image, and then tmt shouldn't require rebuilding inside the target system? i.e. in the Dockerfile we do RUN dnf -y install $(tmt print-target-dependencies)

@henrywang henrywang changed the title [WIP] test: Add integration test running on github runner test: Add integration test running on github runner Sep 10, 2025
@henrywang
Copy link
Collaborator Author

henrywang commented Sep 10, 2025

tmt-1.56 fixed the blocker issue. @cgwalters I think we can land this PR. And I'll add https://github.com/cgwalters/bootc-kit into CI later in another PR. Thanks.

Copy link
Collaborator

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Nothing truly blocking here so if you want to just merge as is go ahead and as always there's followups.

ARCH=$(uname -m)
case "$ARCH" in
"aarch64")
qemu-system-aarch64 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually this is going to need to use a shared library of some form. Can't we just point tmt at the disk image and let testcloud run it or so?

Copy link
Collaborator Author

@henrywang henrywang Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmt will call libvirt, not qemu. bootc-kit can replace those part of code later. I'm working on it now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep and what I'm hitting now is that these qemu instances just leak after the test script runs, so trying to run multiple tests locally just fails.

I fixed this with a quick strategic use of setpriv --deathsig SIGTERM around qemu.

needs: build
strategy:
matrix:
test_os: [fedora-41, fedora-42, fedora-43, centos-9]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is required to get different base image.

build:
strategy:
matrix:
test_os: [fedora-41, fedora-42, fedora-43, centos-9]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stop testing fedora-41 overall and we MUST test centos-10

# Base image
case "$OS_ID" in
"centos")
TIER1_IMAGE_URL="quay.io/centos-bootc/centos-bootc:stream${OS_VERSION_ID}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new and not directly related to this but we have this legacy of "TIER1" naming - there's only one default image and it's likely to stay that way. I'd probably just say BASE_IMAGE_URL going forward for things like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Need update the name.

strategy:
matrix:
test_os: [fedora-41, fedora-42, fedora-43, centos-9]
tmt_plan: [test-01-readonly, test-20-local-upgrade, test-21-logically-bound-switch, test-22-logically-bound-install, test-23-install-outside-container, test-24-local-upgrade-reboot]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be a maintenance trap if we have to list all tests here and in the tmt directory.

One thing that also seems like it will become a bit painful is having a CI context check per test. (It's not wrong, but it will get really long)

A good middle ground might be "sharding"; basically like https://nexte.st/docs/ci-features/partitioning/?h=shard#hashed-sharding where we run a group of tests per GHA runner. That would give us a good mix of parallelism and efficiency.

And crucially by doing the sharding we would avoid needing to list each test; the allocated runner discovers which tests it should run by just taking an integer from 0-N as input, and looking if the hash of the test modulo the input key is zero.

;;
esac

CONTAINERFILE="${BOOTC_TEMPDIR}/Containerfile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would create a third copy of our Dockerfile, I think we can avoid that by just having the github action matrix be directly passed down into e.g.

base=$1
shift
podman build --build-arg=base=$base

And that's all right?

@@ -0,0 +1,100 @@
name: bootc integration test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest question: Will we also continue to run the tests via packit+testing farm? Probably not, right?

But still though, we should bear in mind that we still need to have good bridges/integration with at least tmt/testing-farm since other groups will use that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, test can be run on github action runner and with make test locally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this and I think there's an obvious thing to keep doing with the packit+testing-farm flow, which is testing of system-reinstall-bootc.

For that we very precisely want an RPM and a stock cloud image. It's actually almost a good thing if we're not testing having a new bootc in the target container image just to shake out any cross-version issues.

@cgwalters
Copy link
Collaborator

OK let's get this in and then continue iteration!

@cgwalters cgwalters merged commit 029ed34 into bootc-dev:main Sep 11, 2025
60 checks passed
@henrywang henrywang deleted the qemu_tmt branch September 24, 2025 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants