Skip to content

Add execution environments abstraction and toolset#4393

Open
dmontagu wants to merge 51 commits intomainfrom
execution-environments
Open

Add execution environments abstraction and toolset#4393
dmontagu wants to merge 51 commits intomainfrom
execution-environments

Conversation

@dmontagu
Copy link
Contributor

Summary

  • Introduces ExecutionEnvironment ABC and three implementations (LocalEnvironment, DockerEnvironment, MemoryEnvironment) along with ExecutionEnvironmentToolset for exposing coding-agent-style tools (ls, shell, read_file, write_file, replace_str, glob, grep)
  • This is the foundation for building coding agents and other agents that need shell and filesystem access, split out from the broader code-mode work for independent review and merge
  • Code execution capabilities (run_python, run_python_with_functions, Monty environment, NDJSON driver protocol) are intentionally excluded and will come in a follow-up PR

Closes #XXXX

Test plan

  • 250 tests in tests/test_environments.py covering all three environment implementations and the toolset
  • Lint, typecheck, and format verified locally
  • CI passes
  • Documentation renders correctly

Checklist

  • Selected the correct base branch
  • AI generated code

Introduces ExecutionEnvironment ABC and three implementations
(LocalEnvironment, DockerEnvironment, MemoryEnvironment) along with
ExecutionEnvironmentToolset for exposing coding-agent-style tools
(ls, shell, read_file, write_file, replace_str, glob, grep).

This is the foundation for building coding agents and other agents
that need shell and filesystem access, split out from the broader
code-mode work for independent review and merge.
@github-actions github-actions bot added size: XL Extra large PR (>1500 weighted lines) feature New feature request, or PR implementing a feature (enhancement) labels Feb 21, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

Docs Preview

commit: 0d2b2e9
Preview URL: https://09f81118-pydantic-ai-previews.pydantic.workers.dev

devin-ai-integration[bot]

This comment was marked as resolved.

When multiple agent.run() calls execute concurrently, a shared environment
means they all operate on the same filesystem and processes. The new
environment_factory parameter creates a fresh, isolated environment per
async-with entry using ContextVar-scoped state.

Also renames environment → shared_environment to make concurrency semantics
explicit (positional arg, so existing callers still work).
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Mark huggingface and outlines-vllm-offline extras as conflicting in uv,
and exclude outlines-vllm-offline from --all-extras in CI and Makefile.
- Fix _recv_stream EOF check to distinguish zero-size frames from actual EOF
- Make MemoryEnvironment.capabilities dynamic: include 'shell' when command_handler is set
- Fix LocalEnvironment.grep to use rglob for recursive file search with glob_pattern
- Fix glob_match to use regex for all patterns (fnmatch incorrectly matches '/' with '*')
- Fix build_glob_cmd: add parentheses for correct find operator precedence, fix ./ prefix for -path
- Add double-enter guard in DockerEnvironment._setup to prevent container leak
- Add DockerEnvironment.hardened() convenience constructor for security best practices
- Rename docker-sandbox optional dependency to docker-environment
- Rename 'env' variable to 'environment' in docs to avoid confusion with env vars
- Add lifecycle tip about pre-starting the toolset in docs
devin-ai-integration[bot]

This comment was marked as resolved.

Tools are now registered unconditionally at init time and filtered in
get_tools() based on the current environment's capabilities. This fixes
the issue where environment_factory or use_environment() could expose
tools unsupported by the runtime environment.

Also unifies the Capability type — removes the toolset-level Capability
(with edit_file) and EditStrategy types, using the environment-level
Capability (with replace_str/apply_patch) everywhere.
devin-ai-integration[bot]

This comment was marked as resolved.

- Add `ToolName` literal type for tool-level names exposed to the model
  (`edit_file` instead of `edit_file:replace_str`/`edit_file:apply_patch`)
- `include`/`exclude` now accept `ToolName` values (e.g. `edit_file`)
  instead of env-level `Capability` values
- Rename `_resolve_capabilities` → `_resolve_tool_names`, which maps env
  capabilities to tool names then applies include/exclude filtering
- Rename `replace_str` tool → `edit_file` (the function exposed to models)
- Update `Capability` values: `replace_str` → `edit_file:replace_str`,
  `apply_patch` → `edit_file:apply_patch` in all environments
- Update docs and tests
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor Author

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the glob_pattern filtering for exact file matches in MemoryEnvironment.grep, matching LocalEnvironment behavior.

…rep glob filtering

- Rename `Capability` to `EnvCapability` for clarity
- Remove unused `instructions()` method from base class
- Fix `_resolve_edit_tool` to fall back to auto-detection when env doesn't support the explicit strategy
- Fix `MemoryEnvironment.grep` to skip glob filtering for exact file paths, matching `LocalEnvironment` behavior
- Rename `Capability` → `EnvCapability` to free up the name for other use
- `_resolve_edit_tool` now falls back to auto-detection when the explicit
  `edit_strategy` isn't supported by the environment
- Remove `instructions` method from base class and DockerEnvironment,
  along with associated tests
- Update all imports and type annotations across environments and tests
devin-ai-integration[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

github-actions[bot]

This comment was marked as resolved.

network_disabled: Whether to disable network access.
read_only: Whether to mount the root filesystem as read-only.
Use with `tmpfs` to provide writable scratch space.
cap_drop: Linux capabilities to drop (e.g. `['ALL']`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what this would do

user: User to run as inside the container (e.g. `'nobody'`).
tmpfs: tmpfs mounts as `{path: options}`
(e.g. `{'/tmp': 'noexec,nosuid,size=64m'}`).
init: Whether to use `--init` to run an init process as PID 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This either

cpu_limit: float = 1.0,
pids_limit: int = 256,
) -> DockerEnvironment:
"""Create a hardened Docker environment with security best practices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume/argue this should be the default when a docker environment is being created anyway?

}
)

async def __aenter__(self) -> Self: # pragma: lax no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cover this?

if self._memory_limit:
kwargs['mem_limit'] = self._memory_limit
if self._cpu_limit:
kwargs['nano_cpus'] = int(self._cpu_limit * 1e9)
Copy link
Contributor

@adtyavrdhn adtyavrdhn Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay? Need to check why

# Ensure work_dir exists
self._container.exec_run(['mkdir', '-p', self._work_dir])

async def __aexit__(self, *_args: Any) -> None: # pragma: lax no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cover it and what is up with *_args?

@adtyavrdhn
Copy link
Contributor

Douwe: It would be kind of nice to have hooks / envs to setup the repo or just actions. This was discussed for David's agent

def _check() -> bool:
assert self._container is not None
try:
self._container.reload()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why reload it when we only needed to check if it is running?

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature request, or PR implementing a feature (enhancement) size: XL Extra large PR (>1500 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants