Skip to content

Conversation

@Sjors
Copy link
Member

@Sjors Sjors commented Jan 8, 2026

Bitcoin Core functional tests for the echoipc RPC method were occasionally timing out, see bitcoin/bitcoin#34187. A helpful LLM quickly figured out it was due to doing unsafe things after fork().

https://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

Fix it by having SpawnProcess evaluate fd_to_args and build argv in the parent before fork().

SpawnProcess now avoids ExecProcess. I intentionally did not fix the latter, not just to reduce churn, but also because it's now only used by mpgen and it doesn't seem worth losing the debug errors it (unsafely) throws.

The last commit adds a test which reproduces the issue.

The next commit will use it to create an alternative to ExecProcess that's safe to call post-fork.

Added a note that ExecProcess is not safe to call post-fork. Since it's only used by mpgen at build time, this is fine, and lets of print useful errors.
@DrahtBot
Copy link

DrahtBot commented Jan 8, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #231 (Add windows support by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Sjors Sjors force-pushed the 2026/01/fork-safu branch from 97aeeec to 9909f1a Compare January 8, 2026 12:02
Evaluate fd_to_args and build argv in the parent before fork().

In the child between fork() and execvp(), avoid throwing/allocating
and use _exit(1) on error.

This likely fixes the occasional deadlock in Bitcoin Core functional
test, noticed in bitcoin/bitcoin#34187.
@Sjors Sjors force-pushed the 2026/01/fork-safu branch from 9909f1a to 4e84072 Compare January 8, 2026 12:13
@Sjors
Copy link
Member Author

Sjors commented Jan 8, 2026

Fixed and pre-empted a few other style nits.

Hopefully fixed both CI build failures too (include-what-you-use).

@Sjors Sjors force-pushed the 2026/01/fork-safu branch from 4e84072 to dbbe02e Compare January 8, 2026 12:30
@DrahtBot DrahtBot mentioned this pull request Jan 8, 2026
@Sjors
Copy link
Member Author

Sjors commented Jan 9, 2026

@DrahtBot code suggestion is stale, was fixed in the last push.

@Sjors Sjors force-pushed the 2026/01/fork-safu branch from dbbe02e to 78e50fb Compare January 9, 2026 07:11
@Sjors
Copy link
Member Author

Sjors commented Jan 9, 2026

Added code comments to explain how the test works, and to point out it could trigger false negatives (missing a regression), but won't have spurious failures.

This test fails without the previous commit.
@Sjors Sjors force-pushed the 2026/01/fork-safu branch from 78e50fb to 0baf75c Compare January 9, 2026 07:18
@maflcko
Copy link
Contributor

maflcko commented Jan 9, 2026

@DrahtBot code suggestion is stale, was fixed in the last push.

It wasn't stale. The LLM with this input had a 10% chance of producing this slop response, and you hit it. 🎉

@Sjors
Copy link
Member Author

Sjors commented Jan 9, 2026

@maflcko I'm prepared for next time :-) https://gist.github.com/Sjors/dbe7e7f0159c7fe00f7dc914d1db4242

Copy link
Collaborator

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 0baf75c. I wouldn't have guessed that this could cause bitcoin/bitcoin#34187, but it makes sense and could easily explain the echoipc child process randomly hanging.

I'm a little doubtful about how valuable the test in third commit is since it is long and only catches fd_to_args being called incorrectly in the child process, not MakeArgv, or future bad code that could be added there. But the test is pretty clean, and could be dropped if it ever becomes a maintenance burden.

Left some suggestions below, but happy to merge this as-is if preferred

if (close(fds[pid ? 0 : 1]) != 0) {
throw std::system_error(errno, std::system_category(), "close");
if (pid) throw std::system_error(errno, std::system_category(), "close");
_exit(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Precompute argv before fork in SpawnProcess" (9d3c2b6)

Getting rid of the throw here makes sense but it'd also be nice to have diagnostic and not just return 1. Would suggest something like:

    static constexpr char msg[] = "SpawnProcess(child): close(fds[1]) failed\n";
    (void)::write(STDERR_FILENO, msg, sizeof(msg) - 1);
    _exit(126);

ExecProcess(fd_to_args(fds[0]));

execvp(argv[0], argv.data());
perror("execvp failed");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Precompute argv before fork in SpawnProcess" (9d3c2b6)

Keeping perror is probaby the best choice for now even though strictly speaking it might not be safe and could hang. Would suggest adding a comment like "// NOTE: perror() is not async-signal-safe; calling it here in a post-fork child may still deadlock in multithreaded parents."

Note for future todo: It would be nice to improve this later by adding an error-reporting pipe, writing errno to it so parent process can provide a better diagnostic to code calling it. This idea came up previously in #213 (comment) / 286fe46

// must _exit(1) (post-fork child must not throw).
if (close(fds[pid ? 0 : 1]) != 0) {
throw std::system_error(errno, std::system_category(), "close");
if (pid) throw std::system_error(errno, std::system_category(), "close");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Precompute argv before fork in SpawnProcess" (9d3c2b6)

Note: It looks like there is a pre-existing bug here where if close(fds[0]) fails in the parent fd[1] will not be closed either and could be leaked. Maybe doesn't matter since if one close fails, not sure what chances of other one succeeding could be, but this might be good to address in followup.


execvp(argv[0], argv.data());
perror("execvp failed");
_exit(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Precompute argv before fork in SpawnProcess" (9d3c2b6)

Would suggest replacing 1 with 127, since this is more distinctive and is also what bash returns in $? when a command fails to execute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants