Skip to content

The removal of container root directory should be after poststop run successfully.  #4363

@abel-von

Description

@abel-von

Description

Hi, We recently added a runc hook to manage our special devices, and ran into a problem.
If our post stop hook run failed(maybe because the resource is still not ready to be recycled), we expect the containerd or kubelet could retry runc delete and when the hook can return success, runc delete succeed, and all the resources will be recycled, with nothing residual.

but actually we found that when the second time containerd call runc delete, the container root directory is removed and a NotFound error returned, so that containerd will consider the container removed normally. but actually the post stop hook never be called successfully.

I am wondering maybe we can change the removal of root directory to after the hook called.

--- a/libcontainer/state_linux.go
+++ b/libcontainer/state_linux.go
@@ -54,13 +54,16 @@ func destroy(c *Container) error {
                        return fmt.Errorf("unable to remove container's IntelRDT group: %w", err)
                }
        }
+       c.initProcess = nil
+       if err := runPoststopHooks(c); err != nil {
+               return fmt.Errorf("unable to run post stop hooks: %w", err)
+       }
+       c.state = &stoppedState{c: c}
+
        if err := os.RemoveAll(c.stateDir); err != nil {
                return fmt.Errorf("unable to remove container state dir: %w", err)
        }
-       c.initProcess = nil
-       err := runPoststopHooks(c)
-       c.state = &stoppedState{c: c}
-       return err
+       return nil
 }

Steps to reproduce the issue

  1. write a runc hook to return error, and config it into runtime of containerd
  2. start a container
  3. delete a container

Describe the results you received and expected

expected:
deletion of the container should fail until the hook can be executed succeessfuly.
received:
container is removed, but the hook is never called with a success, so with resources residual.

What version of runc are you using?

The newest version

Host OS information

I think all os is effected.

Host kernel information

all kernel versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions