Skip to content

feat: add basic sandbox abstraction#541

Open
maksymbuleshnyi wants to merge 9 commits intomainfrom
feat/sandbox
Open

feat: add basic sandbox abstraction#541
maksymbuleshnyi wants to merge 9 commits intomainfrom
feat/sandbox

Conversation

@maksymbuleshnyi
Copy link
Contributor

@maksymbuleshnyi maksymbuleshnyi commented Feb 4, 2026

Note

High Risk
Introduces remote shell command execution as an agent tool and new resource-lifecycle cleanup paths, which are security- and stability-sensitive and could enable unintended command execution or leaks if misconfigured.

Overview
Agents can now be configured with an optional sandbox that injects sandbox-provided tools (currently a shell command tool) instead of the built-in file-store tools when enabled.

This introduces a new dynamiq.sandbox module with a base Sandbox/SandboxConfig interface, an E2BSandbox implementation using e2b-desktop, and SandboxShellTool (with simple allow/block prefix validation) plus an example workflow showing E2B usage. Agent lifecycle handling is updated to attempt sandbox cleanup via cleanup()/__del__, and dependencies are updated to add e2b-desktop (and bump pillow via lockfile changes).

Written by Cursor Bugbot for commit 532ddb8. This will update automatically on new commits. Configure here.

@maksymbuleshnyi maksymbuleshnyi requested a review from a team as a code owner February 4, 2026 08:01
@maksymbuleshnyi maksymbuleshnyi marked this pull request as draft February 4, 2026 08:01
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
dynamiq/nodes/agents
   agent.py5229881%180, 183–186, 191, 199, 204, 221–222, 240–241, 319–321, 370–371, 387–388, 409–411, 414, 427–428, 476–478, 484–486, 593, 598, 673, 703, 705–706, 720, 745–749, 782, 788, 792, 828, 850, 855, 872, 876, 949–950, 1001–1003, 1008–1009, 1014, 1020, 1026, 1034, 1072–1073, 1130–1132, 1155–1156, 1195–1201, 1207, 1209–1218, 1220–1221, 1247, 1255–1257, 1266, 1292–1293, 1303, 1311
   base.py74619274%57, 63, 76–77, 151–153, 168, 176, 179, 302–303, 362, 364, 380, 384, 406, 426, 428, 430, 452, 492, 495, 595, 606–607, 688, 699, 723–726, 728, 730–732, 734–735, 742, 744–745, 749–751, 757, 761–767, 769–770, 788, 790, 798–800, 807–809, 812, 831, 838–839, 844, 855–858, 862–864, 866–868, 881, 888–890, 893–894, 899–901, 904–905, 928–931, 933–935, 957–959, 961, 986–988, 990–992, 999–1000, 1002, 1005, 1021–1022, 1026, 1061–1065, 1072, 1074, 1083, 1089–1090, 1094, 1103–1107, 1114, 1117, 1128, 1131, 1141, 1148, 1158, 1166, 1173–1174, 1180–1181, 1186–1187, 1192, 1238, 1257–1260, 1266, 1268, 1288–1290, 1295–1296, 1312–1313, 1317, 1330, 1334, 1340, 1342–1343, 1345–1346, 1348–1351, 1353, 1355, 1357–1360, 1362, 1365, 1367, 1371, 1374, 1376, 1380, 1383, 1385, 1389, 1392–1394, 1401, 1408, 1411–1412
dynamiq/sandbox
   __init__.py30100% 
   base.py411856%33, 41–45, 69, 84, 86, 90, 94, 114–120
   e2b.py603738%33–34, 38–41, 46–48, 66–67, 69–72, 78–79, 84–86, 103, 107–112, 114–116, 120–121, 125, 129–131, 133
TOTAL22699718168% 

Tests Skipped Failures Errors Time
1198 34 💤 0 ❌ 0 🔥 10m 43s ⏱️

@maksymbuleshnyi maksymbuleshnyi marked this pull request as ready for review February 5, 2026 10:08
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

FileSearchTool(file_store=self.file_store_backend),
FileListTool(file_store=self.file_store_backend),
]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sandbox field handling in to_dict serialization

Medium Severity

The new sandbox field in Agent is not added to to_dict_exclude_params and is not explicitly handled in to_dict(). This breaks the established pattern used for similar nested objects like file_store, llm, tools, and memory. Without proper handling, SandboxConfig.to_dict() and Sandbox.to_dict() won't be called during serialization, causing the type field to be missing, for_tracing logic to be bypassed, and potential issues with tracing callbacks and YAML serialization roundtrip. The same issue applies to SandboxShellTool which doesn't exclude its sandbox field from serialization.

Additional Locations (1)

Fix in Cursor Fix in Web

FileSearchTool(file_store=self.file_store_backend),
FileListTool(file_store=self.file_store_backend),
]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sandbox and file store inconsistency when files uploaded

Medium Severity

When sandbox is enabled, __init__ treats sandbox and file_store as mutually exclusive (using elif at line 305). However, execute() doesn't check for sandbox - it creates an InMemoryFileStore and adds file tools whenever files are uploaded and file_store_backend is None. This creates a state where both sandbox tools and file tools exist but operate on separate storage backends. Uploaded files go to InMemoryFileStore (inaccessible from sandbox shell), while files created via sandbox shell remain in the E2B filesystem (inaccessible from file tools). Users uploading files to a sandbox-enabled agent won't be able to access them via shell commands.

Fix in Cursor Fix in Web

kwargs.pop("include_secure_params", None)
config_data = self.model_dump(exclude={"backend"}, **kwargs)
config_data["backend"] = self.backend.to_dict()
return config_data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API key exposure through sandbox serialization ignoring tracing flags

Medium Severity

The Sandbox.to_dict() method pops for_tracing and include_secure_params from kwargs but never uses them - it just calls model_dump() which includes all fields. For E2BSandbox, this serializes the connection field containing the E2B api_key. Additionally, SandboxConfig.to_dict() extracts for_tracing but calls self.backend.to_dict() without passing it, so the tracing-safe flag is not propagated. The BaseConnection.to_dict() method has special logic to return only id and type when for_tracing=True, but since these flags are discarded, sensitive credentials could be exposed in tracing/logging output.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment on lines +25 to +28
background: bool = Field(
default=False,
description="If True, run the command in background without waiting for output.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this something like run_in_background to be more explicit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_in_background_enabled

Comment on lines +86 to +93
# Check allowed commands
if self.allowed_commands:
is_allowed = any(cmd_lower.startswith(allowed.lower()) for allowed in self.allowed_commands)
if not is_allowed:
raise ToolExecutionException(
f"Command '{command}' is not in the allowed commands list.",
recoverable=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the input command is ls; rm -rf /? We won't be able to handle this case if only check the command with .startswith() method

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should have better validation

default=None,
description="Optional list of allowed command prefixes. If set, only these commands are permitted.",
)
blocked_commands: list[str] | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to maintain a default list of prohibited commands (such as rm -rf) and check for any matching patterns against that list.

Copy link
Collaborator

@tyaroshko tyaroshko Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree

List of tool instances (Node objects).
"""
# Lazy import to avoid circular dependency
from dynamiq.sandbox.tools.shell import SandboxShellTool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this import to the beginning of the file

backend=e2b_sandbox,
)

# Create shell tool that uses the sandbox
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean this

from dynamiq.utils.logger import logger
from examples.llm_setup import setup_llm

AGENT_ROLE = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets have example with yaml

Copy link
Contributor

@olbychos olbychos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add examples, restrict unsafe commands

@@ -490,9 +502,14 @@ def execute(
if files:
if not self.file_store_backend:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if sandbox enabled? Shoudl we still add such tools? Because we partially handle this in init


return child_context

def cleanup(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we wanna reuse sandbox on next request? We are going to connect to it?

Also let's rename to def close(self) -> None: as this is more clear name for this purpose and we used if few more nodes it


from dynamiq.sandbox.tools.shell import SandboxShellInputSchema, SandboxShellTool

__all__ = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redunant as we don't use import *

Comment on lines +25 to +28
background: bool = Field(
default=False,
description="If True, run the command in background without waiting for output.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_in_background_enabled

@@ -0,0 +1,8 @@
"""Sandbox tools for command execution and file operations."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it make sense to rename sandbox folder to sandboxes to be consistent

Comment on lines +86 to +93
# Check allowed commands
if self.allowed_commands:
is_allowed = any(cmd_lower.startswith(allowed.lower()) for allowed in self.allowed_commands)
if not is_allowed:
raise ToolExecutionException(
f"Command '{command}' is not in the allowed commands list.",
recoverable=True,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should have better validation

# Lazy import to avoid circular dependency
from dynamiq.sandbox.tools.shell import SandboxShellTool

shell_tool = SandboxShellTool(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we can configure some specific tool params per Agent as here all predefined?


enabled: bool = False
backend: Sandbox = Field(..., description="Sandbox backend to use.")
config: dict[str, Any] = Field(default_factory=dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here also need to add tools with ability to configure each of it. Or maybe in backend. Not sure for now. Idea like:

nodes:
  coding-agent:
    type: dynamiq.nodes.agents.Agent
    sandbox:
      enabled: true
      backend:
        type: dynamiq.sandbox.E2BSandbox
        connection: e2b-conn
      tools:
        shell:
          enabled: true
          allowed_commands: ["python", "pip", "ls", "cat"]
          blocked_commands: ["rm -rf", "sudo"]

data["type"] = self.type
return data

def run_command(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it run_command_shell as we can have later run_command_{type_based_on_tool}

@acoola
Copy link
Collaborator

acoola commented Feb 5, 2026

Try to start from creating YAML with 2 agents that working with different sanboxes and with a different tool set (at lest configuration). This would help to build proper dependencies and components set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants