Add execution environments abstraction and toolset#4393
Open
Add execution environments abstraction and toolset#4393
Conversation
Introduces ExecutionEnvironment ABC and three implementations (LocalEnvironment, DockerEnvironment, MemoryEnvironment) along with ExecutionEnvironmentToolset for exposing coding-agent-style tools (ls, shell, read_file, write_file, replace_str, glob, grep). This is the foundation for building coding agents and other agents that need shell and filesystem access, split out from the broader code-mode work for independent review and merge.
Contributor
Docs Preview
|
dmontagu
commented
Feb 21, 2026
dmontagu
commented
Feb 21, 2026
dmontagu
commented
Feb 21, 2026
dmontagu
commented
Feb 21, 2026
When multiple agent.run() calls execute concurrently, a shared environment means they all operate on the same filesystem and processes. The new environment_factory parameter creates a fresh, isolated environment per async-with entry using ContextVar-scoped state. Also renames environment → shared_environment to make concurrency semantics explicit (positional arg, so existing callers still work).
Mark huggingface and outlines-vllm-offline extras as conflicting in uv, and exclude outlines-vllm-offline from --all-extras in CI and Makefile.
- Fix _recv_stream EOF check to distinguish zero-size frames from actual EOF - Make MemoryEnvironment.capabilities dynamic: include 'shell' when command_handler is set - Fix LocalEnvironment.grep to use rglob for recursive file search with glob_pattern - Fix glob_match to use regex for all patterns (fnmatch incorrectly matches '/' with '*') - Fix build_glob_cmd: add parentheses for correct find operator precedence, fix ./ prefix for -path - Add double-enter guard in DockerEnvironment._setup to prevent container leak - Add DockerEnvironment.hardened() convenience constructor for security best practices - Rename docker-sandbox optional dependency to docker-environment - Rename 'env' variable to 'environment' in docs to avoid confusion with env vars - Add lifecycle tip about pre-starting the toolset in docs
Tools are now registered unconditionally at init time and filtered in get_tools() based on the current environment's capabilities. This fixes the issue where environment_factory or use_environment() could expose tools unsupported by the runtime environment. Also unifies the Capability type — removes the toolset-level Capability (with edit_file) and EditStrategy types, using the environment-level Capability (with replace_str/apply_patch) everywhere.
- Add `ToolName` literal type for tool-level names exposed to the model (`edit_file` instead of `edit_file:replace_str`/`edit_file:apply_patch`) - `include`/`exclude` now accept `ToolName` values (e.g. `edit_file`) instead of env-level `Capability` values - Rename `_resolve_capabilities` → `_resolve_tool_names`, which maps env capabilities to tool names then applies include/exclude filtering - Rename `replace_str` tool → `edit_file` (the function exposed to models) - Update `Capability` values: `replace_str` → `edit_file:replace_str`, `apply_patch` → `edit_file:apply_patch` in all environments - Update docs and tests
dmontagu
commented
Feb 21, 2026
Contributor
Author
dmontagu
left a comment
There was a problem hiding this comment.
Fixed the glob_pattern filtering for exact file matches in MemoryEnvironment.grep, matching LocalEnvironment behavior.
…rep glob filtering - Rename `Capability` to `EnvCapability` for clarity - Remove unused `instructions()` method from base class - Fix `_resolve_edit_tool` to fall back to auto-detection when env doesn't support the explicit strategy - Fix `MemoryEnvironment.grep` to skip glob filtering for exact file paths, matching `LocalEnvironment` behavior
- Rename `Capability` → `EnvCapability` to free up the name for other use - `_resolve_edit_tool` now falls back to auto-detection when the explicit `edit_strategy` isn't supported by the environment - Remove `instructions` method from base class and DockerEnvironment, along with associated tests - Update all imports and type annotations across environments and tests
adtyavrdhn
reviewed
Feb 26, 2026
| network_disabled: Whether to disable network access. | ||
| read_only: Whether to mount the root filesystem as read-only. | ||
| Use with `tmpfs` to provide writable scratch space. | ||
| cap_drop: Linux capabilities to drop (e.g. `['ALL']`). |
Contributor
There was a problem hiding this comment.
No idea what this would do
adtyavrdhn
reviewed
Feb 26, 2026
| user: User to run as inside the container (e.g. `'nobody'`). | ||
| tmpfs: tmpfs mounts as `{path: options}` | ||
| (e.g. `{'/tmp': 'noexec,nosuid,size=64m'}`). | ||
| init: Whether to use `--init` to run an init process as PID 1. |
adtyavrdhn
reviewed
Feb 26, 2026
| cpu_limit: float = 1.0, | ||
| pids_limit: int = 256, | ||
| ) -> DockerEnvironment: | ||
| """Create a hardened Docker environment with security best practices. |
Contributor
There was a problem hiding this comment.
I would assume/argue this should be the default when a docker environment is being created anyway?
adtyavrdhn
reviewed
Feb 26, 2026
| } | ||
| ) | ||
|
|
||
| async def __aenter__(self) -> Self: # pragma: lax no cover |
adtyavrdhn
reviewed
Feb 26, 2026
| if self._memory_limit: | ||
| kwargs['mem_limit'] = self._memory_limit | ||
| if self._cpu_limit: | ||
| kwargs['nano_cpus'] = int(self._cpu_limit * 1e9) |
Contributor
There was a problem hiding this comment.
Okay? Need to check why
adtyavrdhn
reviewed
Feb 26, 2026
| # Ensure work_dir exists | ||
| self._container.exec_run(['mkdir', '-p', self._work_dir]) | ||
|
|
||
| async def __aexit__(self, *_args: Any) -> None: # pragma: lax no cover |
Contributor
There was a problem hiding this comment.
Should cover it and what is up with *_args?
Contributor
|
Douwe: It would be kind of nice to have hooks / envs to setup the repo or just actions. This was discussed for David's agent |
adtyavrdhn
reviewed
Feb 26, 2026
| def _check() -> bool: | ||
| assert self._container is not None | ||
| try: | ||
| self._container.reload() |
Contributor
There was a problem hiding this comment.
Why reload it when we only needed to check if it is running?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ExecutionEnvironmentABC and three implementations (LocalEnvironment,DockerEnvironment,MemoryEnvironment) along withExecutionEnvironmentToolsetfor exposing coding-agent-style tools (ls, shell, read_file, write_file, replace_str, glob, grep)run_python,run_python_with_functions, Monty environment, NDJSON driver protocol) are intentionally excluded and will come in a follow-up PRCloses #XXXX
Test plan
tests/test_environments.pycovering all three environment implementations and the toolsetChecklist