-
Notifications
You must be signed in to change notification settings - Fork 183
kola/multipath: wait for systemd firstboot target #4349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kola/multipath: wait for systemd firstboot target #4349
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix a race condition in multipath tests by waiting for systemd's first-boot-complete.target before rebooting. The approach is sound, but the implementation of the waiting logic has a bug that would prevent it from working as intended. I've provided a detailed comment with a suggested fix for the waiting function. Once that is addressed, the change should effectively resolve the flakiness issue.
87b1989 to
6c14eb0
Compare
| verifyMultipath(c, m, "/var/lib/containers") | ||
| } | ||
|
|
||
| func waitForCompleteFirstboot(c cluster.TestCluster) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional:
rather than running a retry here you could possibly run systemd-run --wait and then run that inside a util.WaitUntilReady like is done here.
i.e. rather than worrying about how many times to retry you can just focus on how much time you want to spend waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the suggestion. Applied it.
c12a98d to
02bcc86
Compare
Although `mpath-var-lib-containers.service` is set to only run on first boot, it sometime runs twice when the system reboots too early. Sometimes, in low load CI environement, the reboot in this test happens before systemd's `first-boot-complete.target` is reached. This make `ConditionFirstBoot` to still be true at the next boot, causing the mpath service to fail, because it already ran during the actual first boot. A previous attempt[1] at fixing this improved the flake but this happened again and i noticed that systemd didn't reach this target before the reboot: `Reached target first-boot-complete.target - First Boot Complete` is only shown after the second boot in the logs. Likely fixes coreos/rhel-coreos-config#66 [1] coreos@abd0c18
02bcc86 to
50e5cd6
Compare
dustymabe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Although
mpath-var-lib-containers.serviceis set to only run on first boot, it sometime runs twice when the system reboots too early. Sometimes, in low load CI environement, the reboot in this test happens before systemd'sfirst-boot-complete.targetis reached. This makeConditionFirstBootto still be true at the next boot, causing the mpath service to fail, because it already ran during the actual first boot.A previous attempt[1] at fixing this improved the flake but this happened again and i noticed that systemd didn't reach this target before the reboot:
Reached target first-boot-complete.target - First Boot Completeis only shown after the second boot in the logs.Likely fixes coreos/rhel-coreos-config#66
[1] abd0c18