Skip to content

Conversation

@everzakov
Copy link

Currently there is an golang issue golang/go#76746 .
If the cmd.Start is called twice (the first call was unsuccessful) then
Cmd.Wait can return an error (because goroutines' pipes will be closed in the first try).
However, the second cmd.Start can return nil.
After https://go-review.googlesource.com/c/go/+/728642 is merged, then
the second call will return an error. So we need to recreate the Cmd object.
We can not simply copy it
https://go-review.googlesource.com/c/go/+/728642/comments/a837fe92_a5fc5f06 .

Fixes #5060

@kolyshkin
Copy link
Contributor

Hmm, the code to create a copy of the cmd is kinda fragile, and I guess what Alan suggested is a better way, i.e.

use some other structure to hold all the information needed to populate a Cmd, and create the Cmd just before you call Start.

One more alternative is to check whether CLONE_INTO_CGROUP is available, but I'm afraid the check is going to be too expensive, and caching its result is not very reliable either.

@everzakov everzakov force-pushed the 5060-cmd-start-fix branch 2 times, most recently from 070b6ed to 4cdc0d6 Compare December 16, 2025 13:14
@everzakov
Copy link
Author

everzakov commented Dec 16, 2025

I guess what Alan suggested is a better way

Hello! Something like this 4cdc0d6 ? I have made this as a Container method.

@lifubang
Copy link
Member

lifubang commented Dec 18, 2025

@everzakov Have you verified whether this issue actually affects runc? If so, why aren’t we seeing any errors on almalinux-8?

In fact, runc doesn’t call cmd.Wait() directly—it uses cmd.Process.Wait() instead to wait for the child process.

Regarding the Go PR (728642), I think the change might be overly strict for users and could potentially be a breaking change.

@lifubang
Copy link
Member

In fact, I changed the CgroupFd to a random num in runc code, and I can't see any errors for runc, for example:

p.cmd.SysProcAttr.CgroupFD = 100 //int(fd.Fd())
lifubang@acmcoder /opt/ubuntu $ sudo ./runc exec test echo hello
hello
lifubang@acmcoder /opt/ubuntu $ sudo ./runc --debug exec test echo hello
...
DEBU[0000]libcontainer/process_linux.go:359 libcontainer.(*setnsProcess).prepareCgroupFD() using CLONE_INTO_CGROUP "/sys/fs/cgroup/user.slice/user-1000.slice/test" 
... libcontainer.(*setnsProcess).startWithCgroupFD() exec with CLONE_INTO_CGROUP failed: fork/exec /proc/self/fd/6: bad file descriptor; retrying without** 
...                       
hello

@everzakov
Copy link
Author

everzakov commented Dec 18, 2025

In fact, runc doesn’t call cmd.Wait() directly—it uses cmd.Process.Wait() instead to wait for the child process.

Actually there are some calls cmd.Wait() - https://github.com/opencontainers/runc/blob/v1.4.0/libcontainer/process_linux.go#L152 and https://github.com/opencontainers/runc/blob/v1.4.0/libcontainer/process_linux.go#L524 .

I can't see any errors for runc

Yeah, the main problem to find this fault in runtime is that all cmd.Wait calls' errors are ignored. Also, it is often preceded by process KILL. So, this error can be found only in tests. You can try to run TestEnter test with your change. The error should be the same as from issue.

Regarding the Go PR (728642), I think the change might be overly strict for users and could potentially be a breaking change.

I have proposed another go pr 728660 . In my PR it will be possible to call the same os/exec Cmd object if the first try was unsuccessful. Also, Cmd.Wait does not return the error. However, i think 728642 will be merged.

@lifubang
Copy link
Member

You're right — we shouldn’t reuse the exec.Command object to restart the process.

Just a discuss, a simple fix would be to move the retry logic into container_linux.go? Maybe in here:

	if err := parent.start(); err != nil {
		return fmt.Errorf("unable to start container process: %w", err)
	}

@lifubang
Copy link
Member

Additionally, adding a test case to ensure the patch works correctly would be appreciated.

@everzakov
Copy link
Author

Additionally, adding a test case to ensure the patch works correctly would be appreciated.

Hello! Well, i'm not very familiar with cgroups library. I think the test should be where we pass the non cgroup path and the check fails in the kernel. However, this path should exist . There are a lot funcs to clean and check prefixes . In Create we pass nil paths in cgroup manager so the default /sys/fs/cgroup pass will be used. I think it will be impossible to create non cgroup child directory in /sys/fs/cgroup tree. But we can use factory Load method to get a container . In this method cgroup paths are passed that's why we can set our non cgroup path.

So, the idea is to patch the dumped container state, load it with changed cgroup paths and turn off the cgroup function to check directory cgroup magic. The first call will fail because we will pass non cgroup dir. The second call doesn't return an error, and the set ns pid will be added successfuly into the "cgroup".
I think it can be very syntetic test but w/o changes it will fail in the new kernels :/

The test logs:
time="2025-12-22T16:28:50Z" level=debug msg="using CLONE_INTO_CGROUP "/tmp/TestCmdRetry2213481631/001""
time="2025-12-22T16:28:50Z" level=debug msg="exec with CLONE_INTO_CGROUP failed: fork/exec /proc/self/fd/6: bad file descriptor; retrying without"

@rata
Copy link
Member

rata commented Jan 13, 2026

@everzakov thanks a lot for catching this and for the PR!

There is one thing I'm not so sure about, why is this not failing on CI? The wait commands you linked seem to either return the error as a param or just make the function return an error. What am I missing?

Also, how did you find the error?

I was reviewing at it for a long time and it seems correct. But this is quite security-sensitive code. Do you mind splitting it in more commits, so it is obvious what is done at each step? I mean, check that the way we create the command is mostly c&p from another place in one commit, then maybe remove the cmd of the functions signature in another commit, etc. This way, we can view in detail that each step is correct.

It's tricky code, I'd like to reduce the chances of doing a mistake.

@everzakov
Copy link
Author

Also, how did you find the error?

Hello, i have tried to run the new release tests with the old Linux kernels (before 5.7 where CLONE_INTO_CGROUP is proposed). In my case, it was the Ubuntu 20.04. TestEnter tests are failed with the error in the issue (exec_test.go:216: unexpected error: read |0: file already closed). I have found that this is related to the golang issue (we can not reuse the same cmd object to retry cmd start) golang/go#76746 . The syntetic test TestCmdRetry is about to force retry of cmd in the new Linux kernels and fails with the same error. Or you can set the wrong cgroup fd and run tests ( #5066 (comment) ).

There is one thing I'm not so sure about, why is this not failing on CI?

I can split these changes into three commits. The first commit will have only the test. I think the CI will fail. The second commit will have the changes to recreate Cmd object. The third commit will be with functions' signature changes.

PS. Now this issue is only about test fails. However, https://go-review.googlesource.com/c/go/+/728642 is merged that's why it will be runtime error with the new golang version on old Linux kernels. If golang/go#77075 will be solved then we can again change the code to clone Cmd object :)

@rata
Copy link
Member

rata commented Jan 13, 2026

SGTM, thanks!

… cmd.Start method if clone3 do not support CLONE_INTO_CGROUP flag

TestCmdRetry is a syntetic test to force retry of cmd.Start method on the new Linux kernels (after 5.7). It can be done if cgroup fd points to the file from non-cgroup tree. In runc there a lot of prefix checks after joining pathes. Also, when container is created from factory, fs2 cgroup manager will choose the path from cgroup tree by default. However, when container is loaded from state.json its path is not checked that's why it is possible to pass the wrong path. Also, we should turn on TestMode in opencontainers/cgroups library.

Signed-off-by: Efim Verzakov <efimverzakov@gmail.com>
In Golang the same exec.Cmd object can not be used for retry Start method. Wait method will return the error even if the second call is successful. More info: golang/go#76746 . This commit adds additional method to recreate cmd Object after the first fail.

Signed-off-by: Efim Verzakov <efimverzakov@gmail.com>
Signed-off-by: Efim Verzakov <efimverzakov@gmail.com>
@everzakov
Copy link
Author

everzakov commented Jan 22, 2026

SGTM, thanks!

Hello!
I have split the code into the three commits. After the first commit, the CI are falied with the same error as in the issue and Golang issue. https://cirrus-ci.com/task/6632526975860736 https://cirrus-ci.com/task/5506627069018112

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are thinking about introducing cmd.Clone() method; in the meantime we can do something like this:

diff --git a/libcontainer/process_linux.go b/libcontainer/process_linux.go
index b3487fd0..68474087 100644
--- a/libcontainer/process_linux.go
+++ b/libcontainer/process_linux.go
@@ -380,11 +380,14 @@ func (p *setnsProcess) startWithCgroupFD() error {
                defer fd.Close()
        }
 
+       cmdCopy := *p.cmd
        err = p.startWithCPUAffinity()
        if err != nil && p.cmd.SysProcAttr.UseCgroupFD {
                logrus.Debugf("exec with CLONE_INTO_CGROUP failed: %v; retrying without", err)
                // SysProcAttr.CgroupFD is never used when UseCgroupFD is unset.
-               p.cmd.SysProcAttr.UseCgroupFD = false
+               cmdCopy.SysProcAttr.UseCgroupFD = false
+               // Must not reuse exec.Cmd (https://go-review.googlesource.com/c/go/+/728642).
+               p.cmd = &cmdCopy
                err = p.startWithCPUAffinity()
        }
 

Tested to work fine with the test case from this PR, go HEAD (1.26-to-be). It works.

It's a bit hacky because we're also copying private fields which might not work in the future. The more correct way is to copy all public fields explicitly (which is less hacky but still not future proof as there may be more public fields added).

As for this patch, it's correct in theory but practically

  • it's quite a big refactoring
  • it makes things more complex than they were

These are the two reasons why I hesitate to LGTM it.

ok(t, err)

// Dump our "pseudo" cgroup path as dirPath
state.CgroupPaths[""] = pseudoPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes (and will work only with) cgroup v2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked the cgroup version in the call of cgroups.IsCgroup2UnifiedMode. As for cgroup v1 it is deprecated :) #4955 .

@kolyshkin
Copy link
Contributor

Opened #5091 as a much simpler alternative to this. As to testing, I tried Go 1.26 with current CI and a few tests fail before the fix so I guess this is sufficient. PTAL @everzakov

// once.
p.closeClonedExes()
var safeExe *os.File
if exeseal.IsSelfExeCloned() {
Copy link
Member

@rata rata Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this commit (" libct: add Container method to recreate cmd Object in setnsProcess") to only move cmd.* assignments to a function that we then just call. And keep all the sealing of the binary untouched.

Is not possible to do something like that, that is super straight-forward?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have kept the sealing untouched. I have deleted only exePath (it is not used in this function). In createCmdObject function I use the created safeExe if runc binary is cloned. I can move createCmdObject after newParentProcess function that's why there won't be any changes in newParentProcess function header.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I can keep the function signatures and should create cmd without cgroup fd. In Setns Process Cmd retry it will be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is we create a function that does the cmd and the commit is just moving cmd.X = asd lines from one place to the other. That will be quite clear we are not missing anything.

@rata
Copy link
Member

rata commented Jan 27, 2026

@kolyshkin does that work with this PR applied too: https://go-review.googlesource.com/c/go/+/728642 ?

I'm skeptical about copying the private fields (specially pointers like that). If we can just move the cmd.X assignments we do to a function and call it, at least in my head, sounds very simple and obviously correct to not do it.

@everzakov
Copy link
Author

@kolyshkin does that work with this PR applied too: https://go-review.googlesource.com/c/go/+/728642 ?

I'm skeptical about copying the private fields (specially pointers like that). If we can just move the cmd.X assignments we do to a function and call it, at least in my head, sounds very simple and obviously correct to not do it.

Yeah, go version go1.26rc2 (in tests matrix) has this commit. Also, it has https://go-review.googlesource.com/c/go/+/734200 where atomic.Bool is changed to bool.

@kolyshkin
Copy link
Contributor

@kolyshkin does that work with this PR applied too: https://go-review.googlesource.com/c/go/+/728642 ?

I'm skeptical about copying the private fields (specially pointers like that). If we can just move the cmd.X assignments we do to a function and call it, at least in my head, sounds very simple and obviously correct to not do it.

In short, it works and the method is kind of approved by the upstream. See #5091 for more details, and let's move the further discussion about the copy approach to #5091.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

runc 1.4 breaks the rule that cmd.Start is called only once that's why process Wait can return an error.

4 participants