You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why
- The previous approach treated many non‑zero exits as sandbox denials, making normal command failures look like sandbox errors.
- SIGSYS mapping was inconsistent (hardcoded to 12 while Linux uses 31), and magic integers made maintenance hard.
What
- Introduced a conservative SandboxVerdict classifier with three outcomes:
- LikelySandbox | LikelyNotSandbox | Unknown
- Rewired process_exec_tool_call to:
- Preserve timeouts as Timeout and non‑SIGSYS signals as Signal
- Defer SIGSYS classification until stderr is available
- Only return Denied on strong sandbox evidence
- Removed legacy is_likely_sandbox_denied (no remaining callers).
- Replaced magic numbers with named constants for clarity and correctness.
- Improved debug logging context.
Details
- New classifier: sandbox_verdict(sandbox_type: SandboxType, status: ExitStatus, stderr: &str) -> SandboxVerdict
- Signals:
- SIGSYS → LikelySandbox (via status.signal() or exit code 128+SIGSYS)
- Common app/user signals → LikelyNotSandbox
- SIGINT_CODE=2, SIGABRT_CODE=6, SIGBUS_CODE=7, SIGFPE_CODE=8, SIGKILL_CODE=9, SIGSEGV_CODE=11, SIGPIPE_CODE=13, SIGTERM_CODE=15
- Exit codes:
- LikelyNotSandbox: 0, 2, 124 (timeout wrapper), 127 (not found), 64..=78 (BSD sysexits), and 128+{the signals above}
- EXIT_NOT_EXECUTABLE_CODE=126 → Unknown unless stderr has explicit sandbox hints
- Otherwise Unknown unless hints indicate sandbox
- Stderr hints (case-insensitive; conservative):
- “sandbox: deny”, “not permitted by sandbox”, “bad system call”, “seccomp”
- Platform constants:
- SIGSYS_CODE: macOS=12, Linux=31 (no libc deps).
- Exec flow changes:
- Defer SIGSYS to inspect stderr; otherwise return Signal immediately.
- After building ExecToolCallOutput, compute verdict; on LikelySandbox → Denied { output }; else if deferred signal → Signal; else return Ok(exec_output).
- Logging:
- tracing::debug!(target: "codex_core::exec", exit_code, ?pending_signal, verdict=?verdict, "exec SandboxClassification")
Removed
- is_likely_sandbox_denied shim removed; note left explaining how to synthesize an ExitStatus if ever needed again.
Refactors / Clippy
- sandbox_verdict now takes ExitStatus by value (small Copy type) to satisfy clippy::trivially-copy-pass-by-ref.
- Replaced raw lists of integers with named constants and a computed array for 128+signal codes.
Tests
- Unit tests in core/src/exec.rs:
- not_sandbox_obvious_codes: 127/124/2 → LikelyNotSandbox
- sandbox_sigsys_codepath: 128+SIGSYS → LikelySandbox
- sandbox_stderr_hint: 126 + “Sandbox: deny …” → LikelySandbox
- unknown_generic_failure: 1 → Unknown
- Existing integration/suite tests already cover timeout/signal/Denied and OK paths.
Manual Test Plan
- macOS seatbelt:
- Run a command that exits 126 and prints “Sandbox: deny …” to stderr → Denied.
- Run a command that exits 127 or 2 → not Denied.
- Linux seccomp:
- Simulate bad syscall exit 128+31 → Denied.
- 130/141/143 (SIGINT/SIGPIPE/SIGTERM) → not Denied; surfaced as normal failure or Signal.
- Confirm debug line appears with target codex_core::exec and message “exec SandboxClassification”.
Behavioral Changes
- Normal non‑zero exits (e.g., exit 1) are no longer misclassified as sandbox errors.
- Timeouts and non‑SIGSYS signals retain previous semantics.
- Denials are now attributed only on strong evidence.
Risk / Rollout
- Scope limited to core/src/exec.rs; no new dependencies.
- macOS and Linux only; no *BSD branches.
- Low risk: adds conservative classification and clearer logging; removes unused shim.
Follow‑ups (optional)
- Add a macOS‑specific stderr hint “sandbox-exec:” if we want to treat seatbelt launcher failures as LikelySandbox.
- Expand tests to include a couple more 128+signal codes to lock in “not sandbox” classification.
- Run just fix -p codex-core and optionally a full workspace test pass once reviewers approve.
0 commit comments