Skip to content

Race condition in subprocess handling #2049

@mtharp

Description

@mtharp

Describe the bug
v0.16.0 introduced a subprocess reaper via #2043. It appears to have a race condition where the reaper can wait() on a process spawned via os/exec.Cmd.Run before Run can wait on it and get its return code.

The symptom is waitid: no child processes errors and spurious 404 Not Found responses, as seen in #2048. The fix attempted there, however, only suppresses the symptom, but the return code of the child process is lost so the proxy doesn't know whether the command succeeded.

athens' Dockerfile specifies tini as the entrypoint to the container. This should adequately care for wayward grandchildren - they will be re-parented and waited on by PID 1 i.e. tini which is there for just that purpose. I was previously running v0.11.0 for several years and in that time I don't think I saw any zombie processes with tini in place. I suspect that users who are having zombie trouble are inadvertently running athens without tini.

Theoretically it's possible for athens to internalize some means of reaping grandchild processes, but this would be redundant with tini and it seems challenging to implement it in-process without interfering with os/exec. In my opinion, it makes the most sense to revert #2048 and #2043 and let tini handle reaping duties, as it was before v0.16.0.

Error Message
Typical case in which the reaper gets to a process before exec.Cmd can:

Jun 03 02:38:36 goproxy.example.com athens-proxy[2289053]: INFO[6:38AM]: reaped child process 1449344, exit status: 1
Jun 03 02:38:36 goproxy.example.com athens-proxy[2289053]: INFO[6:38AM]: wait: no child processes: go: module google.golang.org/protobuf/runtime/protoiface: reading https://proxy.golang.org/google.golang.org/protobuf/runtime/protoiface/@v/list: 404 Not Found
Jun 03 02:38:36 goproxy.example.com athens-proxy[2289053]:         server response: not found: module google.golang.org/protobuf/runtime/protoiface: no matching versions for query "latest"

To Reproduce
Normal traffic (esp. with @v/list and/or requests for nonexistent modules) under moderate load.

Expected behavior
No spuriously reaped processes, no "wait" errors, and legitimate error cases (e.g. a nonexistent repository) are handled normally.

Environment (please complete the following information):

  • OS: linux/amd64 in podman
  • Go version : 1.24.x
  • Proxy version : v0.16.0
  • Storage: disk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions