-
Notifications
You must be signed in to change notification settings - Fork 154
test: Add integration test running on github runner #1496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new integration test running on GitHub runners, which is a great addition. The implementation involves new scripts for building a test image and running it in QEMU. My review focuses on improving the robustness and security of these new test scripts. I've identified a few critical issues that would prevent the tests from running as intended, such as an undefined command and incorrect error handling in the presence of set -e. I've also pointed out security concerns regarding disabled TLS verification and opportunities to make the scripts more robust and maintainable. Please see the detailed comments for suggestions.
|
|
||
| # This script runs disk image with qemu-system and run tmt against this vm. | ||
|
|
||
| BOOTC_TEMPDIR="/tmp/tmp-bootc-build" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The temporary directory path /tmp/tmp-bootc-build is hardcoded here and in build.sh. This creates a tight, implicit coupling between the two scripts. It would be more robust to define this path in one place (e.g., in the Makefile), export it as an environment variable, and have both scripts use it. This would also make it easier to manage the lifecycle of the temporary directory (creation and cleanup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we could just require this script to be invoked in the context of the temporary directory.
That said also, the only thing that actually binds the two things together right now is the ssh key.
But we can avoid that by not using --root-ssh-authorized-keys at bootc install to-disk time but instead injecting the SSH key via systemd credentials - that's how podman-bootc does it.
(Of course this whole topic instantly gets into the whole https://gitlab.com/fedora/bootc/tracker/-/issues/2 and whether/how podman-bootc and other virt provisioning tools should be shared underneath different testing frameworks)
Anyways for now though let's just change the GHA to allocate a temporary directory, see below
1ce535b to
a2def9a
Compare
|
@cgwalters , The two failures still need investigate. It's tmt issue, not our code issue. |
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Install podman for heredoc support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR specifically but we should probably factor this stuff out into a shared action helper
.github/workflows/integration.yml
Outdated
| build: | ||
| strategy: | ||
| matrix: | ||
| test_os: [fedora-42] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this be quay.io/fedora/fedora:42 e.g. - see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test_os will be [fedora-41, fedora-42, fedora-43, centos-9, centos-10].
|
|
||
| # This script runs disk image with qemu-system and run tmt against this vm. | ||
|
|
||
| BOOTC_TEMPDIR="/tmp/tmp-bootc-build" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we could just require this script to be invoked in the context of the temporary directory.
That said also, the only thing that actually binds the two things together right now is the ssh key.
But we can avoid that by not using --root-ssh-authorized-keys at bootc install to-disk time but instead injecting the SSH key via systemd credentials - that's how podman-bootc does it.
(Of course this whole topic instantly gets into the whole https://gitlab.com/fedora/bootc/tracker/-/issues/2 and whether/how podman-bootc and other virt provisioning tools should be shared underneath different testing frameworks)
Anyways for now though let's just change the GHA to allocate a temporary directory, see below
| cargo xtask test-tmt | ||
|
|
||
| test: | ||
| tests/build.sh && tests/test.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would then become:
tmpd=$(mktemp -d) && cd $tmpd && $srcdir/tests/build.sh && $srcdir/tests/test.sh
or so
33c2934 to
3067f33
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this as is if you prefer, we can land further cleanups as followups
We had a live chat about this and I think the fact that tmt is dynamically rebuilding the OS image on boot is a problem. In the long term ideally the testing framework doesn't require installing any binaries at all on the target (this would let us test minimal systems without python etc) but:
|
eac749f to
07ef754
Compare
9347e32 to
9f5e783
Compare
Signed-off-by: Xiaofeng Wang <[email protected]>
Signed-off-by: Xiaofeng Wang <[email protected]>
Signed-off-by: Xiaofeng Wang <[email protected]>
Signed-off-by: Xiaofeng Wang <[email protected]>
|
|
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Nothing truly blocking here so if you want to just merge as is go ahead and as always there's followups.
| ARCH=$(uname -m) | ||
| case "$ARCH" in | ||
| "aarch64") | ||
| qemu-system-aarch64 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually this is going to need to use a shared library of some form. Can't we just point tmt at the disk image and let testcloud run it or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmt will call libvirt, not qemu. bootc-kit can replace those part of code later. I'm working on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep and what I'm hitting now is that these qemu instances just leak after the test script runs, so trying to run multiple tests locally just fails.
I fixed this with a quick strategic use of setpriv --deathsig SIGTERM around qemu.
| needs: build | ||
| strategy: | ||
| matrix: | ||
| test_os: [fedora-41, fedora-42, fedora-43, centos-9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is required to get different base image.
| build: | ||
| strategy: | ||
| matrix: | ||
| test_os: [fedora-41, fedora-42, fedora-43, centos-9] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stop testing fedora-41 overall and we MUST test centos-10
| # Base image | ||
| case "$OS_ID" in | ||
| "centos") | ||
| TIER1_IMAGE_URL="quay.io/centos-bootc/centos-bootc:stream${OS_VERSION_ID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new and not directly related to this but we have this legacy of "TIER1" naming - there's only one default image and it's likely to stay that way. I'd probably just say BASE_IMAGE_URL going forward for things like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Need update the name.
| strategy: | ||
| matrix: | ||
| test_os: [fedora-41, fedora-42, fedora-43, centos-9] | ||
| tmt_plan: [test-01-readonly, test-20-local-upgrade, test-21-logically-bound-switch, test-22-logically-bound-install, test-23-install-outside-container, test-24-local-upgrade-reboot] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's going to be a maintenance trap if we have to list all tests here and in the tmt directory.
One thing that also seems like it will become a bit painful is having a CI context check per test. (It's not wrong, but it will get really long)
A good middle ground might be "sharding"; basically like https://nexte.st/docs/ci-features/partitioning/?h=shard#hashed-sharding where we run a group of tests per GHA runner. That would give us a good mix of parallelism and efficiency.
And crucially by doing the sharding we would avoid needing to list each test; the allocated runner discovers which tests it should run by just taking an integer from 0-N as input, and looking if the hash of the test modulo the input key is zero.
| ;; | ||
| esac | ||
|
|
||
| CONTAINERFILE="${BOOTC_TEMPDIR}/Containerfile" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would create a third copy of our Dockerfile, I think we can avoid that by just having the github action matrix be directly passed down into e.g.
base=$1
shift
podman build --build-arg=base=$base
And that's all right?
| @@ -0,0 +1,100 @@ | |||
| name: bootc integration test | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Biggest question: Will we also continue to run the tests via packit+testing farm? Probably not, right?
But still though, we should bear in mind that we still need to have good bridges/integration with at least tmt/testing-farm since other groups will use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, test can be run on github action runner and with make test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought more about this and I think there's an obvious thing to keep doing with the packit+testing-farm flow, which is testing of system-reinstall-bootc.
For that we very precisely want an RPM and a stock cloud image. It's actually almost a good thing if we're not testing having a new bootc in the target container image just to shake out any cross-version issues.
|
OK let's get this in and then continue iteration! |
New integration test workflow:
Run integration test in Github runner or run locally with
TEST_OS=fedora-42 make testbuild job: