Skip to content

Commit e75e82f

Browse files
committed
feat(core[exec]): replace sandbox verifier bool with tri‑state classifier; correct SIGSYS; remove misattribution
Why - The previous approach treated many non‑zero exits as sandbox denials, making normal command failures look like sandbox errors. - SIGSYS mapping was inconsistent (hardcoded to 12 while Linux uses 31), and magic integers made maintenance hard. What - Introduced a conservative SandboxVerdict classifier with three outcomes: - LikelySandbox | LikelyNotSandbox | Unknown - Rewired process_exec_tool_call to: - Preserve timeouts as Timeout and non‑SIGSYS signals as Signal - Defer SIGSYS classification until stderr is available - Only return Denied on strong sandbox evidence - Removed legacy is_likely_sandbox_denied (no remaining callers). - Replaced magic numbers with named constants for clarity and correctness. - Improved debug logging context. Details - New classifier: sandbox_verdict(sandbox_type: SandboxType, status: ExitStatus, stderr: &str) -> SandboxVerdict - Signals: - SIGSYS → LikelySandbox (via status.signal() or exit code 128+SIGSYS) - Common app/user signals → LikelyNotSandbox - SIGINT_CODE=2, SIGABRT_CODE=6, SIGBUS_CODE=7, SIGFPE_CODE=8, SIGKILL_CODE=9, SIGSEGV_CODE=11, SIGPIPE_CODE=13, SIGTERM_CODE=15 - Exit codes: - LikelyNotSandbox: 0, 2, 124 (timeout wrapper), 127 (not found), 64..=78 (BSD sysexits), and 128+{the signals above} - EXIT_NOT_EXECUTABLE_CODE=126 → Unknown unless stderr has explicit sandbox hints - Otherwise Unknown unless hints indicate sandbox - Stderr hints (case-insensitive; conservative): - “sandbox: deny”, “not permitted by sandbox”, “bad system call”, “seccomp” - Platform constants: - SIGSYS_CODE: macOS=12, Linux=31 (no libc deps). - Exec flow changes: - Defer SIGSYS to inspect stderr; otherwise return Signal immediately. - After building ExecToolCallOutput, compute verdict; on LikelySandbox → Denied { output }; else if deferred signal → Signal; else return Ok(exec_output). - Logging: - tracing::debug!(target: "codex_core::exec", exit_code, ?pending_signal, verdict=?verdict, "exec SandboxClassification") Removed - is_likely_sandbox_denied shim removed; note left explaining how to synthesize an ExitStatus if ever needed again. Refactors / Clippy - sandbox_verdict now takes ExitStatus by value (small Copy type) to satisfy clippy::trivially-copy-pass-by-ref. - Replaced raw lists of integers with named constants and a computed array for 128+signal codes. Tests - Unit tests in core/src/exec.rs: - not_sandbox_obvious_codes: 127/124/2 → LikelyNotSandbox - sandbox_sigsys_codepath: 128+SIGSYS → LikelySandbox - sandbox_stderr_hint: 126 + “Sandbox: deny …” → LikelySandbox - unknown_generic_failure: 1 → Unknown - Existing integration/suite tests already cover timeout/signal/Denied and OK paths. Manual Test Plan - macOS seatbelt: - Run a command that exits 126 and prints “Sandbox: deny …” to stderr → Denied. - Run a command that exits 127 or 2 → not Denied. - Linux seccomp: - Simulate bad syscall exit 128+31 → Denied. - 130/141/143 (SIGINT/SIGPIPE/SIGTERM) → not Denied; surfaced as normal failure or Signal. - Confirm debug line appears with target codex_core::exec and message “exec SandboxClassification”. Behavioral Changes - Normal non‑zero exits (e.g., exit 1) are no longer misclassified as sandbox errors. - Timeouts and non‑SIGSYS signals retain previous semantics. - Denials are now attributed only on strong evidence. Risk / Rollout - Scope limited to core/src/exec.rs; no new dependencies. - macOS and Linux only; no *BSD branches. - Low risk: adds conservative classification and clearer logging; removes unused shim. Follow‑ups (optional) - Add a macOS‑specific stderr hint “sandbox-exec:” if we want to treat seatbelt launcher failures as LikelySandbox. - Expand tests to include a couple more 128+signal codes to lock in “not sandbox” classification. - Run just fix -p codex-core and optionally a full workspace test pass once reviewers approve.
1 parent d73055c commit e75e82f

File tree

1 file changed

+192
-21
lines changed

1 file changed

+192
-21
lines changed

codex-rs/core/src/exec.rs

Lines changed: 192 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,24 @@ const SIGKILL_CODE: i32 = 9;
3636
const TIMEOUT_CODE: i32 = 64;
3737
const EXIT_CODE_SIGNAL_BASE: i32 = 128; // conventional shell: 128 + signal
3838
const EXEC_TIMEOUT_EXIT_CODE: i32 = 124; // conventional timeout exit code
39+
const EXIT_NOT_EXECUTABLE_CODE: i32 = 126; // "found but not executable"/shebang/perms
40+
41+
// Common Unix signal numbers used below (document for maintainers)
42+
// Keep these aligned with platform conventions; these are standard on macOS/Linux.
43+
const SIGINT_CODE: i32 = 2; // Ctrl-C / interrupt
44+
const SIGABRT_CODE: i32 = 6; // abort
45+
const SIGBUS_CODE: i32 = 7; // bus error
46+
const SIGFPE_CODE: i32 = 8; // floating point exception
47+
const SIGSEGV_CODE: i32 = 11; // segmentation fault
48+
const SIGPIPE_CODE: i32 = 13; // broken pipe
49+
const SIGTERM_CODE: i32 = 15; // termination
50+
51+
// Platform-gated SIGSYS signal numbers with no libc dep (Linux & macOS only).
52+
// Linux SIGSYS is 31; macOS SIGSYS is 12.
53+
#[cfg(target_os = "linux")]
54+
const SIGSYS_CODE: i32 = 31;
55+
#[cfg(target_os = "macos")]
56+
const SIGSYS_CODE: i32 = 12;
3957

4058
// I/O buffer sizing
4159
const READ_CHUNK_SIZE: usize = 8192; // bytes per read
@@ -72,6 +90,97 @@ pub enum SandboxType {
7290
LinuxSeccomp,
7391
}
7492

93+
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
94+
enum SandboxVerdict {
95+
LikelySandbox,
96+
LikelyNotSandbox,
97+
Unknown,
98+
}
99+
100+
fn stderr_hints_sandbox(stderr: &str) -> bool {
101+
// Conservative: require explicit sandbox/seccomp phrasing likely to come from
102+
// a kernel/seatbelt/seccomp denial; generic perms errors are too noisy.
103+
let s = stderr.to_ascii_lowercase();
104+
s.contains("sandbox: deny")
105+
|| s.contains("not permitted by sandbox")
106+
|| s.contains("bad system call")
107+
|| s.contains("seccomp")
108+
}
109+
110+
fn sandbox_verdict(sandbox_type: SandboxType, status: ExitStatus, stderr: &str) -> SandboxVerdict {
111+
if matches!(sandbox_type, SandboxType::None) {
112+
return SandboxVerdict::LikelyNotSandbox;
113+
}
114+
115+
// Prefer the signal path when available.
116+
#[cfg(unix)]
117+
{
118+
if let Some(sig) = status.signal() {
119+
if sig == SIGSYS_CODE {
120+
// SIGSYS under seccomp (Linux) or bad syscall elsewhere is a strong tell.
121+
return SandboxVerdict::LikelySandbox;
122+
}
123+
// Common user/app signals -> not sandbox.
124+
match sig {
125+
SIGINT_CODE | SIGABRT_CODE | SIGBUS_CODE | SIGFPE_CODE | SIGKILL_CODE
126+
| SIGSEGV_CODE | SIGPIPE_CODE | SIGTERM_CODE => {
127+
return SandboxVerdict::LikelyNotSandbox;
128+
}
129+
_ => { /* fall through */ }
130+
}
131+
}
132+
}
133+
134+
let code = status.code().unwrap_or(-1);
135+
136+
// Immediate NotSandbox codes
137+
// - 0: success
138+
// - 2: common "misuse of shell builtins"
139+
// - 124: conventional timeout wrapper exit code
140+
// - 127: command not found
141+
// - 64..=78: BSD sysexits range
142+
// - 128 + {SIGINT, SIGABRT, SIGBUS, SIGFPE, SIGKILL, SIGSEGV, SIGPIPE, SIGTERM}
143+
let sig_derived_not_sandbox = [
144+
EXIT_CODE_SIGNAL_BASE + SIGINT_CODE,
145+
EXIT_CODE_SIGNAL_BASE + SIGABRT_CODE,
146+
EXIT_CODE_SIGNAL_BASE + SIGBUS_CODE,
147+
EXIT_CODE_SIGNAL_BASE + SIGFPE_CODE,
148+
EXIT_CODE_SIGNAL_BASE + SIGKILL_CODE,
149+
EXIT_CODE_SIGNAL_BASE + SIGSEGV_CODE,
150+
EXIT_CODE_SIGNAL_BASE + SIGPIPE_CODE,
151+
EXIT_CODE_SIGNAL_BASE + SIGTERM_CODE,
152+
];
153+
if code == 0
154+
|| code == 2
155+
|| code == EXEC_TIMEOUT_EXIT_CODE
156+
|| code == 127
157+
|| (64..=78).contains(&code)
158+
|| sig_derived_not_sandbox.contains(&code)
159+
{
160+
return SandboxVerdict::LikelyNotSandbox;
161+
}
162+
163+
// Shell-style signal encoding for SIGSYS.
164+
if code == EXIT_CODE_SIGNAL_BASE + SIGSYS_CODE {
165+
return SandboxVerdict::LikelySandbox;
166+
}
167+
168+
// 126 is often perms/shebang; upgrade only with explicit hints.
169+
if code == EXIT_NOT_EXECUTABLE_CODE {
170+
return if stderr_hints_sandbox(stderr) {
171+
SandboxVerdict::LikelySandbox
172+
} else {
173+
SandboxVerdict::Unknown
174+
};
175+
}
176+
177+
if stderr_hints_sandbox(stderr) {
178+
return SandboxVerdict::LikelySandbox;
179+
}
180+
181+
SandboxVerdict::Unknown
182+
}
183+
75184
#[derive(Clone)]
76185
pub struct StdoutStream {
77186
pub sub_id: String,
@@ -142,12 +251,17 @@ pub async fn process_exec_tool_call(
142251
Ok(raw_output) => {
143252
#[allow(unused_mut)]
144253
let mut timed_out = raw_output.timed_out;
254+
// If the process was killed by a signal, handle timeouts distinctly and
255+
// defer SIGSYS (possible sandbox) until after we can examine stderr.
256+
let mut pending_signal: Option<i32> = None;
145257

146258
#[cfg(target_family = "unix")]
147259
{
148260
if let Some(signal) = raw_output.exit_status.signal() {
149261
if signal == TIMEOUT_CODE {
150262
timed_out = true;
263+
} else if signal == SIGSYS_CODE && !matches!(sandbox_type, SandboxType::None) {
264+
pending_signal = Some(signal);
151265
} else {
152266
return Err(CodexErr::Sandbox(SandboxErr::Signal(signal)));
153267
}
@@ -177,11 +291,26 @@ pub async fn process_exec_tool_call(
177291
}));
178292
}
179293

180-
if exit_code != 0 && is_likely_sandbox_denied(sandbox_type, exit_code) {
294+
let verdict = sandbox_verdict(
295+
sandbox_type,
296+
raw_output.exit_status,
297+
&exec_output.stderr.text,
298+
);
299+
tracing::debug!(
300+
target: "codex_core::exec",
301+
exit_code = exec_output.exit_code,
302+
?pending_signal,
303+
verdict = ?verdict,
304+
"exec SandboxClassification"
305+
);
306+
if matches!(verdict, SandboxVerdict::LikelySandbox) {
181307
return Err(CodexErr::Sandbox(SandboxErr::Denied {
182308
output: Box::new(exec_output),
183309
}));
184310
}
311+
if let Some(sig) = pending_signal {
312+
return Err(CodexErr::Sandbox(SandboxErr::Signal(sig)));
313+
}
185314

186315
Ok(exec_output)
187316
}
@@ -192,26 +321,6 @@ pub async fn process_exec_tool_call(
192321
}
193322
}
194323

195-
/// We don't have a fully deterministic way to tell if our command failed
196-
/// because of the sandbox - a command in the user's zshrc file might hit an
197-
/// error, but the command itself might fail or succeed for other reasons.
198-
/// For now, we conservatively check for 'command not found' (exit code 127),
199-
/// and can add additional cases as necessary.
200-
fn is_likely_sandbox_denied(sandbox_type: SandboxType, exit_code: i32) -> bool {
201-
if sandbox_type == SandboxType::None {
202-
return false;
203-
}
204-
205-
// Quick rejects: well-known non-sandbox shell exit codes
206-
// 127: command not found, 2: misuse of shell builtins
207-
if exit_code == 127 {
208-
return false;
209-
}
210-
211-
// For all other cases, we assume the sandbox is the cause
212-
true
213-
}
214-
215324
#[derive(Debug)]
216325
pub struct StreamOutput<T> {
217326
pub text: T,
@@ -436,3 +545,65 @@ fn synthetic_exit_status(code: i32) -> ExitStatus {
436545
#[expect(clippy::unwrap_used)]
437546
std::process::ExitStatus::from_raw(code.try_into().unwrap())
438547
}
548+
549+
#[cfg(test)]
550+
mod tests {
551+
use super::*;
552+
#[cfg(unix)]
553+
use std::os::unix::process::ExitStatusExt;
554+
555+
#[cfg(unix)]
556+
fn status_from_code(c: i32) -> ExitStatus {
557+
// Synthesize a normal exit status with the given exit code.
558+
ExitStatus::from_raw(c << 8)
559+
}
560+
561+
#[test]
562+
#[cfg(unix)]
563+
fn not_sandbox_obvious_codes() {
564+
assert!(matches!(
565+
sandbox_verdict(SandboxType::MacosSeatbelt, status_from_code(127), ""),
566+
SandboxVerdict::LikelyNotSandbox
567+
));
568+
assert!(matches!(
569+
sandbox_verdict(SandboxType::LinuxSeccomp, status_from_code(124), ""),
570+
SandboxVerdict::LikelyNotSandbox
571+
));
572+
assert!(matches!(
573+
sandbox_verdict(SandboxType::LinuxSeccomp, status_from_code(2), ""),
574+
SandboxVerdict::LikelyNotSandbox
575+
));
576+
}
577+
578+
#[test]
579+
#[cfg(unix)]
580+
fn sandbox_sigsys_codepath() {
581+
let code = EXIT_CODE_SIGNAL_BASE + SIGSYS_CODE;
582+
assert!(matches!(
583+
sandbox_verdict(SandboxType::LinuxSeccomp, status_from_code(code), ""),
584+
SandboxVerdict::LikelySandbox
585+
));
586+
}
587+
588+
#[test]
589+
#[cfg(unix)]
590+
fn sandbox_stderr_hint() {
591+
assert!(matches!(
592+
sandbox_verdict(
593+
SandboxType::MacosSeatbelt,
594+
status_from_code(126),
595+
"Sandbox: deny file-read-data"
596+
),
597+
SandboxVerdict::LikelySandbox
598+
));
599+
}
600+
601+
#[test]
602+
#[cfg(unix)]
603+
fn unknown_generic_failure() {
604+
assert!(matches!(
605+
sandbox_verdict(SandboxType::MacosSeatbelt, status_from_code(1), ""),
606+
SandboxVerdict::Unknown
607+
));
608+
}
609+
}

0 commit comments

Comments
 (0)