Skip to content

Commit 49d8702

Browse files
committed
Fix crash logs: verify crash data was received
The poll() check from #12783 relied on POLLIN to detect crash data, but this is unreliable - on some systems poll() sets POLLIN for EOF (socket closed without data). Instead, we now verify crash data by actually reading from stdin: if read() returns 0 bytes in wait_mode, traffic_server exited normally and crash_logger_invoke was never called, so we exit without logging. This actually (as opposed to the previous patch) distinguishes a real crash (data written then pipe closed) from a normal exit (pipe just closed).
1 parent 11e99cb commit 49d8702

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

src/traffic_crashlog/traffic_crashlog.cc

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -176,17 +176,16 @@ main(int /* argc ATS_UNUSED */, const char **argv)
176176
// sent us signal info via the pipe. If traffic_server just exited normally, the pipe will be
177177
// closed with no data, and we should exit without logging a false "crash".
178178
if (wait_mode) {
179-
// Use poll to check if there's data available on stdin without blocking indefinitely.
179+
// Use poll to wait briefly for potential crash data. If crash_logger_invoke was called,
180+
// data should already be there. We only use poll() as a timeout guard here - the actual
181+
// verification that crash data was received happens when we try to read() below.
180182
struct pollfd pfd;
181183
pfd.fd = STDIN_FILENO;
182184
pfd.events = POLLIN;
183185

184-
// Wait briefly for data. If crash_logger_invoke was called, data should already be there.
185186
int poll_result = poll(&pfd, 1, 100); // 100ms timeout
186187

187-
// POLLHUP means the write end of the pipe was closed - normal exit, not crash.
188-
// No data or error also means no crash occurred.
189-
if (poll_result <= 0 || (pfd.revents & POLLHUP) || !(pfd.revents & POLLIN)) {
188+
if (poll_result < 0) {
190189
return 0;
191190
}
192191
}
@@ -229,6 +228,13 @@ main(int /* argc ATS_UNUSED */, const char **argv)
229228
target.flags |= CRASHLOG_HAVE_THREADINFO;
230229

231230
nbytes = read(STDIN_FILENO, &target.siginfo, sizeof(target.siginfo));
231+
if (nbytes <= 0) {
232+
// No data received. In wait mode, this means traffic_server exited normally without
233+
// crashing - crash_logger_invoke was never called, so no signal info was written.
234+
if (wait_mode) {
235+
return 0;
236+
}
237+
}
232238
if (nbytes < static_cast<ssize_t>(sizeof(target.siginfo))) {
233239
Warning("received %zd of %zu expected signal info bytes", nbytes, sizeof(target.siginfo));
234240
target.flags &= ~CRASHLOG_HAVE_THREADINFO;
@@ -240,6 +246,9 @@ main(int /* argc ATS_UNUSED */, const char **argv)
240246
target.flags &= ~CRASHLOG_HAVE_THREADINFO;
241247
}
242248
}
249+
// Note: On non-Linux platforms, crash_logger_invoke doesn't write signal info to the pipe,
250+
// so we cannot distinguish a crash from a normal exit. This may result in false crash logs
251+
// being generated when traffic_server exits normally on those platforms.
243252

244253
logname = crashlog_name();
245254

0 commit comments

Comments
 (0)