fix(doctor): bound subprocess probes#764
Conversation
add a shared timeout runner for local doctor probes with 10s defaults and 15s freshness subprocess limits. kill timed-out process groups on unix and return specific doctor timeout rows or freshness details instead of hanging the report.
kill and reap timed-out probes without joining reader threads so escaped descendants cannot block doctor callers. share timeout check construction and aggregate freshness timeout output while keeping public report fields unchanged.
remove duplicate descendant cleanup coverage and keep the timeout runner tests focused on caller bounds, large output capture, and basic timeout behavior.
collapse redundant runner assertions into the remaining regression tests and shorten timeout diagnostic setup while preserving deadlock and aggregation coverage.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35494ab6ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let stdout = stdout_thread.map(join_reader).unwrap_or_default(); | ||
| let stderr = stderr_thread.map(join_reader).unwrap_or_default(); |
There was a problem hiding this comment.
Keep successful probes bounded while collecting output
When the direct child exits before timeout but leaves a descendant running with inherited stdout/stderr (for example a CLI probe that daemonizes or starts a background helper), wait_timeout returns Ok(Some(_)) and this code immediately joins the reader threads. Those threads are blocked in read_to_end until every inherited pipe handle closes, so the doctor can still hang indefinitely even though the probe child already finished and no timeout cleanup runs. This undermines the new bounded-probe behavior for any probed command with lingering children; the success path needs an output collection strategy that is also bounded or cleans up the process group before joining.
Useful? React with 👍 / 👎.
summary
tests