You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -217,7 +217,7 @@ See [docs/CONFIGURATION.md#agent-configuration](docs/CONFIGURATION.md#agent-conf
217
217
218
218
> **Workspace isolation:** During `sanity eval`, each agent runs in an isolated temporary workspace under `/tmp` rather than inside `eval-results/`. This prevents agents from reading other eval results, sibling task solutions, or their own `agent.log`. After the agent finishes, files are copied back to `eval-results/` for validation. Combined with the bubblewrap sandbox (which uses `--tmpfs /tmp`), agents have zero visibility into other evaluations.
219
219
220
-
> **Sandbox note:**`sanity eval` runs agents inside a [bubblewrap](https://github.com/containers/bubblewrap) sandbox where `$HOME` is read-only. All dot-directories under `$HOME` (e.g. `~/.my-agent/`) are automatically writable, so most agents work out of the box. For non-dot directories, add them to `sanity.toml` under`[sandbox] writable_dirs`. Use `--no-sandbox` to disable.
220
+
> **Sandbox note:**`sanity eval` runs agents inside a [bubblewrap](https://github.com/containers/bubblewrap) sandbox where `$HOME` is read-only. All dot-directories under `$HOME` (e.g. `~/.my-agent/`) are automatically writable, so most agents work out of the box. For non-dot writable paths, use `[sandbox] writable_dirs`; for sensitive readable paths to mask, use`[sandbox] readable_denylist`. Use `--no-sandbox` to disable.
221
221
222
222
> **Legacy mode:** Prior to v1.6.0, a bug caused hidden tests to be included in the workspace during `sanity eval`, making them visible to agents. The `--legacy` flag reproduces this behavior so that older evaluation runs can be fairly compared or resumed. When `--legacy` is active, hidden test files are written to the workspace at init time (instead of being overlaid just before validation), and the hidden-test overlay step is skipped. Use this flag when resuming runs that were originally executed with the buggy behavior.
0 commit comments