-
Notifications
You must be signed in to change notification settings - Fork 2.9k
podman system migrate fixes when pause process and conmon got killed #27604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pkg/domain/infra/abi/system_linux.go
Outdated
| if became { | ||
| os.Exit(ret) | ||
| } | ||
| // FIXME: should this be an error? If we never re-exec that seems like a big issue here as we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we at least drop a logrus.Errorf here? See if people actually report seeing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this on of the return to satisfy the compiler and not something that should ever be reached by my reading of the code so I think even a hard error should be safe, I just added the fixme to get your and others attention to it in case I am was missing something.
I am fine with both a logrus.Error or even hard error. I am somewhat confident in the 550-pause-process.bats test cases that should check all the cases. The one thing the tests cannot do of course is check what happens if we reach here in parallel.
I wonder if taking the alive lock here would be a good idea.
|
Overall LGTM minus one comment, but we really need a Giuseppe review to merge, my grasp on this code isn't as good as I'd like |
giuseppe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There is no good reason to use logrus and os.Exit() here, other parts of this function already return the error so do the same. The main podman process will exit then with the normal formatted error message. And also log an error about the last return which should never happen as we should have exited above if the re-exec worked or errored out. Signed-off-by: Paul Holzinger <[email protected]>
Based on the description in commit 63ef557 this was added so that the migrate command does not move the pause process into a separate cgroup. It should however not disable the rejoining of the userns when the pause process join failed. BEcause of this we end up calling migrate without a userns and that then can fail if there are actual contianer it tries to cleanup. Fixes: 63ef557 ("command: migrate doesn't move process to cgroup") Signed-off-by: Paul Holzinger <[email protected]>
Just a minor improvement as we know the size needed for the slice we can allocate it only once instead of the append having to resize it. Signed-off-by: Paul Holzinger <[email protected]>
When trying to join the conmon pid to recreate the pause process based on the namespace it can be that the pid is no longer valid, i.e. when conmon crashed or was killed. Currently we have a big issue that can be reproduced using: $ podman run -d quay.io/libpod/testimage:20241011 sleep 100 $ killall -9 conmon $ killall catatonit All commands would fail as we keep trying to rejoin the namespace of the non existing conmon process. So to address that fall back to creating a new namespace if we fail to join the conmon pids. Signed-off-by: Paul Holzinger <[email protected]>
There doesn't seem any reason why the system commands should not join the userns. In particular the main commands use ParentNSRequired and UnshareNSRequired when they don't want to be joined to the main userns. Since the system command don't set these the go code does the join and re-exec anyway so might as well use the shortcut to speed that up. Signed-off-by: Paul Holzinger <[email protected]>
|
Sure |
|
/cherry-pick v5.7 |
|
@Luap99: new pull request created: #27611 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
see individual commits
Does this PR introduce a user-facing change?