Skip to content

Commit a0213e4

Browse files
peremyachfacebook-github-bot
authored andcommitted
Fix crashes in gsymutil resource accounting
Summary: poll uses wnohang option, so ocasionally we got empty stats and the service crashed because we never actually waited for the process to complete and there is a safeguard in folly for that https://fburl.com/code/ujrzktm5 We mostly succeeded probably because communicate call above returned shortly before the process death, but there were ocasional races. We could poll in a cycle in the service until the state is running, but that seemed like a cludge. So I decided to make changes to folly. I thought about adding options field to poll. But semantically seemed wrong that we can wait in a function named poll. Also there is no retry loop for properly handling waiting erros. Wait already uses waitpid under the hood, changing it to wait4 seemed risky. So I decided to add a new function which is kinda of a mix of a two. Reviewed By: jwiepert, hodgesds Differential Revision: D73506301 fbshipit-source-id: 344833608409a5bdd5873e0fe50b863b4347c1a5
1 parent b149a83 commit a0213e4

File tree

2 files changed

+27
-0
lines changed

2 files changed

+27
-0
lines changed

third-party/folly/src/folly/Subprocess.cpp

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1085,6 +1085,25 @@ ProcessReturnCode Subprocess::wait() {
10851085
return returnCode_;
10861086
}
10871087

1088+
ProcessReturnCode Subprocess::waitAndGetRusage(struct rusage* ru) {
1089+
returnCode_.enforce(ProcessReturnCode::RUNNING);
1090+
DCHECK_GT(pid_, 0);
1091+
int status;
1092+
pid_t found;
1093+
do {
1094+
found = ::wait4(pid_, &status, 0, ru);
1095+
} while (found == -1 && errno == EINTR);
1096+
// The only two remaining errors are ECHILD (other code reaped the
1097+
// child?), or EINVAL (cosmic rays?), and both merit an abort:
1098+
PCHECK(found != -1) << "wait4(" << pid_ << ", &status, 0, resourceUsage)";
1099+
// Though the child process had quit, this call does not close the pipes
1100+
// since its descendants may still be using them.
1101+
DCHECK_EQ(found, pid_);
1102+
returnCode_ = ProcessReturnCode::make(status);
1103+
pid_ = -1;
1104+
return returnCode_;
1105+
}
1106+
10881107
void Subprocess::waitChecked() {
10891108
wait();
10901109
checkStatus(returnCode_);

third-party/folly/src/folly/Subprocess.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,14 @@ class Subprocess {
670670
*/
671671
ProcessReturnCode wait();
672672

673+
/**
674+
* Wait for the process to terminate and return its status and rusage. Like
675+
* poll(), the only exception this can throw is std::logic_error if you call
676+
* this on a Subprocess whose status is not RUNNING. Aborts on egregious
677+
* violations of contract, like an out-of-band wait4(p.pid(), 0, 0, nullptr).
678+
*/
679+
ProcessReturnCode waitAndGetRusage(struct rusage* ru);
680+
673681
/**
674682
* Wait for the process to terminate, throw if unsuccessful.
675683
*/

0 commit comments

Comments
 (0)