diff --git a/README.md b/README.md index f5e74eeb7..025d60efe 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ limitations under the License. ✨ **Simplicity**: the logic for agents fits in ~1,000 lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code! -🧑‍💻 **First-class support for Code Agents**. Our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [E2B](https://e2b.dev/), [Modal](https://modal.com/), Docker, or Pyodide+Deno WebAssembly sandbox. +🧑‍💻 **First-class support for Code Agents**. Our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [Blaxel](https://blaxel.ai), [E2B](https://e2b.dev/), [Modal](https://modal.com/), Docker, or Pyodide+Deno WebAssembly sandbox. 🤗 **Hub integrations**: you can [share/pull tools or agents to/from the Hub](https://huggingface.co/docs/smolagents/reference/tools#smolagents.Tool.from_hub) for instant sharing of the most efficient agents! @@ -228,7 +228,7 @@ Writing actions as code snippets is demonstrated to work better than the current Especially, since code execution can be a security concern (arbitrary code execution!), we provide options at runtime: - a secure python interpreter to run code more safely in your environment (more secure than raw code execution but still risky) - - a sandboxed environment using [E2B](https://e2b.dev/) or Docker (removes the risk to your own system). + - a sandboxed environment using [Blaxel](https://blaxel.ai), [E2B](https://e2b.dev/), or Docker (removes the risk to your own system). Alongside [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent), we also provide the standard [`ToolCallingAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.ToolCallingAgent) which writes actions as JSON/text blobs. You can pick whichever style best suits your use case. @@ -254,7 +254,7 @@ This comparison shows that open-source models can now take on the best closed mo ## Security Security is a critical consideration when working with code-executing agents. Our library provides: -- Sandboxed execution options using [E2B](https://e2b.dev/), [Modal](https://modal.com/), Docker, or Pyodide+Deno WebAssembly sandbox +- Sandboxed execution options using [Blaxel](https://blaxel.ai), [E2B](https://e2b.dev/), [Modal](https://modal.com/), Docker, or Pyodide+Deno WebAssembly sandbox - Best practices for running agent code securely For security policies, vulnerability reporting, and more information on secure agent execution, please see our [Security Policy](SECURITY.md). diff --git a/docs/source/en/guided_tour.md b/docs/source/en/guided_tour.md index c64ee41f4..8ad6336cb 100644 --- a/docs/source/en/guided_tour.md +++ b/docs/source/en/guided_tour.md @@ -85,7 +85,7 @@ This could also be authorized by using `numpy.*`, which will allow `numpy` as we The execution will stop at any code trying to perform an illegal operation or if there is a regular Python error with the code generated by the agent. -You can also use [E2B code executor](https://e2b.dev/docs#what-is-e2-b) or Docker instead of a local Python interpreter. For E2B, first [set the `E2B_API_KEY` environment variable](https://e2b.dev/dashboard?tab=keys) and then pass `executor_type="e2b"` upon agent initialization. For Docker, pass `executor_type="docker"` during initialization. +You can also use [Blaxel](https://blaxel.ai), [E2B](https://e2b.dev/docs#what-is-e2-b), or Docker instead of a local Python interpreter. For Blaxel, first [set the `BL_API_KEY` and `BL_WORKSPACE` environment variables](https://app.blaxel.ai/profile/security) and then pass `executor_type="blaxel"` upon agent initialization. For E2B, first [set the `E2B_API_KEY` environment variable](https://e2b.dev/dashboard?tab=keys) and then pass `executor_type="e2b"`. For Docker, pass `executor_type="docker"`. > [!TIP] diff --git a/docs/source/en/index.md b/docs/source/en/index.md index 49432903e..51e392a7e 100644 --- a/docs/source/en/index.md +++ b/docs/source/en/index.md @@ -12,7 +12,7 @@ Key features of `smolagents` include: ✨ **Simplicity**: The logic for agents fits in ~thousand lines of code. We kept abstractions to their minimal shape above raw code! -🧑‍💻 **First-class support for Code Agents**: [`CodeAgent`](reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code") to invoke tools or perform computations, enabling natural composability (function nesting, loops, conditionals). To make it secure, we support [executing in sandboxed environment](tutorials/secure_code_execution) via [E2B](https://e2b.dev/) or via Docker. +🧑‍💻 **First-class support for Code Agents**: [`CodeAgent`](reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code") to invoke tools or perform computations, enabling natural composability (function nesting, loops, conditionals). To make it secure, we support [executing in sandboxed environment](tutorials/secure_code_execution) via [Blaxel](https://blaxel.ai), [E2B](https://e2b.dev/), or Docker. 📡 **Common Tool-Calling Agent Support**: In addition to CodeAgents, [`ToolCallingAgent`](reference/agents#smolagents.ToolCallingAgent) supports usual JSON/text-based tool-calling for scenarios where that paradigm is preferred. diff --git a/docs/source/en/installation.md b/docs/source/en/installation.md index 44c84eee1..44e8f3bf7 100644 --- a/docs/source/en/installation.md +++ b/docs/source/en/installation.md @@ -180,24 +180,32 @@ Extras for handling different types of media and input: Extras for executing code remotely: -- **docker**: Add support for executing code in Docker containers. +- **blaxel**: Add support for Blaxel sandboxes - fast-launching VMs with hibernation (recommended). ```bash - pip install "smolagents[docker]" + pip install "smolagents[blaxel]" ``` - **e2b**: Enable E2B support for remote execution. ```bash pip install "smolagents[e2b]" ``` +- **docker**: Add support for executing code in Docker containers. + ```bash + pip install "smolagents[docker]" + ``` -- **docker**: Add support for executing code in Docker containers. +- **blaxel**: Add support for Blaxel sandboxes - fast-launching VMs with hibernation (recommended). ```bash - uv pip install "smolagents[docker]" + uv pip install "smolagents[blaxel]" ``` - **e2b**: Enable E2B support for remote execution. ```bash uv pip install "smolagents[e2b]" ``` +- **docker**: Add support for executing code in Docker containers. + ```bash + uv pip install "smolagents[docker]" + ``` diff --git a/docs/source/en/reference/agents.md b/docs/source/en/reference/agents.md index de4c680b4..d451c2c3f 100644 --- a/docs/source/en/reference/agents.md +++ b/docs/source/en/reference/agents.md @@ -67,6 +67,10 @@ Smolagents use memory to store information across multiple steps. [[autodoc]] smolagents.remote_executors.RemotePythonExecutor +#### BlaxelExecutor + +[[autodoc]] smolagents.remote_executors.BlaxelExecutor + #### E2BExecutor [[autodoc]] smolagents.remote_executors.E2BExecutor diff --git a/docs/source/en/tutorials/secure_code_execution.md b/docs/source/en/tutorials/secure_code_execution.md index c07afe854..a8429b6c7 100644 --- a/docs/source/en/tutorials/secure_code_execution.md +++ b/docs/source/en/tutorials/secure_code_execution.md @@ -118,13 +118,46 @@ When working with AI agents that execute code, security is paramount. There are ![Sandbox approaches comparison](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolagents/sandboxed_execution.png) -1. **Running individual code snippets in a sandbox**: This approach (left side of diagram) only executes the agent-generated Python code snippets in a sandbox while keeping the rest of the agentic system in your local environment. It's simpler to set up using `executor_type="e2b"`, `executor_type="modal"`, or +1. **Running individual code snippets in a sandbox**: This approach (left side of diagram) only executes the agent-generated Python code snippets in a sandbox while keeping the rest of the agentic system in your local environment. It's simpler to set up using `executor_type="blaxel"`, `executor_type="e2b"`, `executor_type="modal"`, or `executor_type="docker"`, but it doesn't support multi-agents and still requires passing state data between your environment and the sandbox. 2. **Running the entire agentic system in a sandbox**: This approach (right side of diagram) runs the entire agentic system, including the agent, model, and tools, within a sandbox environment. This provides better isolation but requires more manual setup and may require passing sensitive credentials (like API keys) to the sandbox environment. This guide describes how to set up and use both types of sandbox approaches for your agent applications. +### Blaxel setup + +#### Installation + +1. Create a Blaxel account at [blaxel.ai](https://blaxel.ai) +2. Install the required packages: +```bash +pip install 'smolagents[blaxel]' +``` + +#### Running your agent with Blaxel: quick start + +We provide a simple way to use a Blaxel Sandbox: simply add `executor_type="blaxel"` to the agent initialization, as follows: + +```py +from smolagents import InferenceClientModel, CodeAgent + +with CodeAgent(model=InferenceClientModel(), tools=[], executor_type="blaxel") as agent: + agent.run("Can you give me the 100th Fibonacci number?") +``` + +> [!TIP] +> Using the agent as a context manager (with the `with` statement) ensures that the Blaxel sandbox is cleaned up immediately after the agent completes its task. +> Alternatively, you can manually call the agent's `cleanup()` method. + +This solution sends the agent state to the server at the start of each `agent.run()`. +Then the models are called from the local environment, but the generated code will be sent to the sandbox for execution, and only the output will be returned. + +Blaxel provides fast-launching virtual machines that start from hibernation in under 25ms and scale back to zero after inactivity while maintaining memory state, making it an excellent choice for agent applications that require quick, secure code execution. + +> [!TIP] +> For even stronger security isolation, you can host your entire agent remotely on Blaxel. This provides complete sandboxing of the agent, model, and tools. See the [Blaxel agent hosting documentation](https://docs.blaxel.ai/Agents/Develop-an-agent-py) for details. + ### E2B setup #### Installation @@ -423,7 +456,7 @@ agent.run("Can you give me the 100th Fibonacci number?") ### Best practices for sandboxes -These key practices apply to both E2B and Docker sandboxes: +These key practices apply to Blaxel, E2B, and Docker sandboxes: - Resource management - Set memory and CPU limits @@ -449,9 +482,10 @@ As illustrated in the diagram earlier, both sandboxing approaches have different ### Approach 1: Running just the code snippets in a sandbox - **Pros**: - - Easier to set up with a simple parameter (`executor_type="e2b"` or `executor_type="docker"`) + - Easier to set up with a simple parameter (`executor_type="blaxel"`, `executor_type="e2b"`, or `executor_type="docker"`) - No need to transfer API keys to the sandbox - Better protection for your local environment + - Fast execution with Blaxel's hibernation technology (<25ms startup) - **Cons**: - Doesn't support multi-agents (managed agents) - Still requires transferring state between your environment and the sandbox diff --git a/examples/sandboxed_execution.py b/examples/sandboxed_execution.py index 2e968b21a..715d4c455 100644 --- a/examples/sandboxed_execution.py +++ b/examples/sandboxed_execution.py @@ -3,6 +3,11 @@ model = InferenceClientModel() +# Blaxel executor example +with CodeAgent(tools=[WebSearchTool()], model=model, executor_type="blaxel") as agent: + output = agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?") +print("Blaxel executor result:", output) + # Docker executor example with CodeAgent(tools=[WebSearchTool()], model=model, executor_type="docker") as agent: output = agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?") diff --git a/pyproject.toml b/pyproject.toml index 570fbdc46..736bff95d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -24,6 +24,9 @@ dependencies = [ bedrock = [ "boto3>=1.36.18" ] +blaxel = [ + "blaxel>=0.2.19", +] torch = [ "torch", "torchvision", @@ -85,7 +88,7 @@ vllm = [ "torch" ] all = [ - "smolagents[audio,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,bedrock]", + "smolagents[audio,blaxel,docker,e2b,gradio,litellm,mcp,mlx-lm,modal,openai,telemetry,toolkit,transformers,vision,bedrock]", ] quality = [ "ruff>=0.9.0", diff --git a/src/smolagents/agents.py b/src/smolagents/agents.py index 0b1adaffd..69d3fc32a 100644 --- a/src/smolagents/agents.py +++ b/src/smolagents/agents.py @@ -75,7 +75,7 @@ LogLevel, Monitor, ) -from .remote_executors import DockerExecutor, E2BExecutor, ModalExecutor, WasmExecutor +from .remote_executors import BlaxelExecutor, DockerExecutor, E2BExecutor, ModalExecutor, WasmExecutor from .tools import BaseTool, Tool, validate_tool_arguments from .utils import ( AgentError, @@ -1490,7 +1490,7 @@ class CodeAgent(MultiStepAgent): prompt_templates ([`~agents.PromptTemplates`], *optional*): Prompt templates. additional_authorized_imports (`list[str]`, *optional*): Additional authorized imports for the agent. planning_interval (`int`, *optional*): Interval at which the agent will run a planning step. - executor_type (`Literal["local", "e2b", "modal", "docker", "wasm"]`, default `"local"`): Type of code executor. + executor_type (`Literal["local", "blaxel", "e2b", "modal", "docker", "wasm"]`, default `"local"`): Type of code executor. executor_kwargs (`dict`, *optional*): Additional arguments to pass to initialize the executor. max_print_outputs_length (`int`, *optional*): Maximum length of the print outputs. stream_outputs (`bool`, *optional*, default `False`): Whether to stream outputs during execution. @@ -1508,7 +1508,7 @@ def __init__( prompt_templates: PromptTemplates | None = None, additional_authorized_imports: list[str] | None = None, planning_interval: int | None = None, - executor_type: Literal["local", "e2b", "modal", "docker", "wasm"] = "local", + executor_type: Literal["local", "blaxel", "e2b", "modal", "docker", "wasm"] = "local", executor_kwargs: dict[str, Any] | None = None, max_print_outputs_length: int | None = None, stream_outputs: bool = False, @@ -1556,7 +1556,7 @@ def __init__( "Caution: you set an authorization for all imports, meaning your agent can decide to import any package it deems necessary. This might raise issues if the package is not installed in your environment.", level=LogLevel.INFO, ) - if executor_type not in {"local", "e2b", "modal", "docker", "wasm"}: + if executor_type not in {"local", "blaxel", "e2b", "modal", "docker", "wasm"}: raise ValueError(f"Unsupported executor type: {executor_type}") self.executor_type = executor_type self.executor_kwargs: dict[str, Any] = executor_kwargs or {} @@ -1583,6 +1583,7 @@ def create_python_executor(self) -> PythonExecutor: if self.managed_agents: raise Exception("Managed agents are not yet supported with remote code execution.") remote_executors = { + "blaxel": BlaxelExecutor, "e2b": E2BExecutor, "docker": DockerExecutor, "wasm": WasmExecutor, diff --git a/src/smolagents/remote_executors.py b/src/smolagents/remote_executors.py index 5e5f55549..3e9afd5fd 100644 --- a/src/smolagents/remote_executors.py +++ b/src/smolagents/remote_executors.py @@ -40,7 +40,7 @@ from .utils import AgentError -__all__ = ["E2BExecutor", "ModalExecutor", "DockerExecutor", "WasmExecutor"] +__all__ = ["BlaxelExecutor", "E2BExecutor", "ModalExecutor", "DockerExecutor", "WasmExecutor"] try: @@ -609,6 +609,323 @@ def _strip_ansi_colors(cls, text: str) -> str: return cls._ANSI_ESCAPE.sub("", text) +class BlaxelExecutor(RemotePythonExecutor): + """ + Executes Python code using Blaxel sandboxes. + + Blaxel provides fast-launching virtual machines that start from hibernation in under 25ms + and scale back to zero after inactivity while maintaining memory state. + + Args: + additional_imports (`list[str]`): Additional Python packages to install. + logger (`Logger`): Logger to use for output and errors. + sandbox_name (`str`, optional): Name for the sandbox. Defaults to "smolagent-executor". + image (`str`, optional): Docker image to use. Defaults to "blaxel/prod-base:latest". + memory (`int`, optional): Memory allocation in MB. Defaults to 4096. + region (`str`, optional): Deployment region. If not specified, Blaxel chooses default. + create_kwargs (`dict`, optional): Additional arguments for sandbox creation. + """ + + def __init__( + self, + additional_imports: list[str], + logger, + sandbox_name: str = "smolagent-executor", + image: str = "blaxel/prod-py-app:latest", + memory: int = 4096, + region: Optional[str] = None, + create_kwargs: Optional[dict] = None, + ): + super().__init__(additional_imports, logger) + + try: + import blaxel # noqa: F401 + except ModuleNotFoundError: + raise ModuleNotFoundError( + "Please install 'blaxel' extra to use BlaxelExecutor: `pip install 'smolagents[blaxel]'`" + ) + + self.sandbox_name = sandbox_name + self.image = image + self.memory = memory + self.region = region + self._cleaned_up = False # Flag to prevent double cleanup + + # Prepare sandbox creation parameters + sandbox_config = { + "name": sandbox_name, + "image": image, + "memory": memory, + } + + if region: + sandbox_config["region"] = region + + if create_kwargs: + sandbox_config.update(create_kwargs) + + # Create the sandbox + try: + self.sandbox = self._run_async(self._create_sandbox_async, sandbox_config) + except Exception as e: + raise RuntimeError(f"Failed to create Blaxel sandbox: {e}") from e + + # Install additional packages + self.installed_packages = self.install_packages(additional_imports) + self.logger.log(f"BlaxelExecutor initialized with sandbox {sandbox_name}", level=LogLevel.INFO) + + def _run_async(self, coro_func, *args, **kwargs): + """ + Run an async coroutine function safely, handling various event loop states. + Args: + coro_func: An async function (not a coroutine object) + *args, **kwargs: Arguments to pass to the async function + This method handles: + - No event loop exists (most common case) + - Event loop is running (uses thread pool) + - Event loop is closed (creates new loop) + """ + import asyncio + import concurrent.futures + + def _run_in_new_loop(): + """Helper to run coroutine in a fresh event loop (for thread pool).""" + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + try: + return loop.run_until_complete(coro_func(*args, **kwargs)) + finally: + try: + loop.close() + except Exception: + pass + + try: + # Try to get the running loop + _loop = asyncio.get_running_loop() + # If we get here, there's a running loop + # We can't block it, so run in a thread pool with a fresh loop + with concurrent.futures.ThreadPoolExecutor() as executor: + future = executor.submit(_run_in_new_loop) + return future.result() + except RuntimeError as e: + if "no running event loop" in str(e).lower() or "no current event loop" in str(e).lower(): + # No running loop - this is the normal case + try: + return asyncio.run(coro_func(*args, **kwargs)) + except RuntimeError as run_error: + if "Event loop is closed" in str(run_error): + # Loop was closed, create a fresh one + return _run_in_new_loop() + else: + raise + else: + # It was a different RuntimeError, re-raise it + raise + + async def _create_sandbox_async(self, config): + """Helper method to create sandbox asynchronously.""" + from blaxel.core import SandboxInstance + + return await SandboxInstance.create(config) + + def run_code_raise_errors(self, code: str) -> CodeOutput: + """ + Execute Python code in the Blaxel sandbox and return the result. + + Args: + code (`str`): Python code to execute. + + Returns: + `CodeOutput`: Code output containing the result, logs, and whether it is the final answer. + """ + try: + return self._run_async(self._run_code_async, code) + except Exception as e: + self.logger.log_error(f"Code execution failed: {e}") + raise + + async def _run_code_async(self, code: str) -> CodeOutput: + """Helper method to run code asynchronously.""" + try: + # Wrap the code to handle final answer exceptions + wrapped_code = f''' +import base64 +import pickle + +try: + # Execute the user code +{self._indent_code(code, " ")} +except Exception as e: + # Check if it's a FinalAnswerException + if e.__class__.__name__ == "{RemotePythonExecutor.FINAL_ANSWER_EXCEPTION}": + # This is our special final answer exception + # Encode the exception's value attribute + encoded_value = base64.b64encode(pickle.dumps(e.value)).decode('utf-8') + print("FINAL_ANSWER_EXCEPTION:" + encoded_value) + else: + # Regular exception, just re-raise it + raise +''' + + await self.sandbox.fs.write("run-code.py", wrapped_code) + await self.sandbox.process.exec( + { + "name": "run-code", + "command": "python run-code.py", + } + ) + await self.sandbox.process.wait("run-code", max_wait=60000, interval=1000) + execution_result = await self.sandbox.process.get("run-code") + logs = await self.sandbox.process.logs("run-code", "all") + + # Process the execution result + result = None + is_final_answer = False + + # Check for final answer exception in the output + if "FINAL_ANSWER_EXCEPTION:" in logs: + final_answer_line = [line for line in logs.split("\n") if line.startswith("FINAL_ANSWER_EXCEPTION:")][ + 0 + ] + encoded_answer = final_answer_line.split("FINAL_ANSWER_EXCEPTION:", 1)[1] + try: + result = pickle.loads(base64.b64decode(encoded_answer)) + # Remove the final answer exception line from logs + logs = logs.replace(final_answer_line, "").strip("\n") + is_final_answer = True + except Exception: + # If we can't decode the final answer, treat as regular output + pass + + # Check for execution errors + if hasattr(execution_result, "error") and execution_result.error and not is_final_answer: + error_msg = execution_result.error + if hasattr(execution_result, "error_message"): + error_msg = execution_result.error_message + raise AgentError(f"Code execution failed:\n{logs}\n{error_msg}", self.logger) + + # For regular execution, use logs as output + if not is_final_answer: + result = logs + + return CodeOutput(output=result, logs=logs, is_final_answer=True) + + except Exception as e: + if isinstance(e, AgentError): + raise + raise AgentError(f"Failed to execute code in Blaxel sandbox: {e}", self.logger) + + def _indent_code(self, code: str, indent: str) -> str: + """Helper method to indent code properly.""" + return "\n".join(indent + line if line.strip() else line for line in code.split("\n")) + + def install_packages(self, additional_imports: list[str]) -> list[str]: + """ + Install additional Python packages in the Blaxel sandbox. + + Args: + additional_imports (`list[str]`): Package names to install. + + Returns: + list[str]: Installed packages. + """ + if not additional_imports: + return [] + + try: + return self._run_async(self._install_packages_async, additional_imports) + except Exception as e: + self.logger.log_error(f"Failed to install packages: {e}") + return [] + + async def _install_packages_async(self, additional_imports: list[str]) -> list[str]: + """Helper method to install packages asynchronously.""" + try: + # Install packages using pip via run_code + self.logger.log(f"Installing packages: {', '.join(additional_imports)}", level=LogLevel.INFO) + pip_install_code = f"pip install --root-user-action=ignore {' '.join(additional_imports)}" + + await self.sandbox.process.exec( + { + "name": "install-packages", + "command": pip_install_code, + } + ) + # Wait for completion (max 10 minutes, check every 1 second) + await self.sandbox.process.wait("install-packages", max_wait=600000, interval=1000) + + # Check the exit code to determine success + process_info = await self.sandbox.process.get("install-packages") + if hasattr(process_info, "exit_code") and process_info.exit_code != 0: + # Non-zero exit code means failure + error_logs = await self.sandbox.process.logs("install-packages", "stderr") + self.logger.log_error(f"Failed to install packages (exit code {process_info.exit_code}): {error_logs}") + return [] + + self.logger.log(f"Successfully installed packages: {', '.join(additional_imports)}", level=LogLevel.INFO) + return additional_imports + + except Exception as e: + self.logger.log_error(f"Error installing packages: {e}") + return [] + + def _delete_sandbox_sync(self): + """Delete sandbox using Blaxel's sync API and wait for completion.""" + import time + + from blaxel.core.client import client + from blaxel.core.client.api.compute import delete_sandbox, get_sandbox + + self.logger.log(f"Requesting sandbox {self.sandbox_name} deletion...", level=LogLevel.INFO) + delete_sandbox.sync(client=client, sandbox_name=self.sandbox_name) + + # Wait for deletion to complete (max 10 checks, 0.5s apart = 5s total) + for _ in range(10): + self.logger.log(f"Checking if sandbox {self.sandbox_name} is deleted...", level=LogLevel.INFO) + try: + response = get_sandbox.sync(client=client, sandbox_name=self.sandbox_name) + if response is None: + self.logger.log(f"Sandbox {self.sandbox_name} deleted successfully", level=LogLevel.INFO) + return + except Exception: + # Error getting sandbox usually means it's deleted + self.logger.log(f"Sandbox {self.sandbox_name} deleted successfully", level=LogLevel.INFO) + return + + time.sleep(0.5) + + # If we get here, deletion is still in progress + self.logger.log(f"Sandbox {self.sandbox_name} deletion in progress", level=LogLevel.INFO) + + def cleanup(self): + """Sync wrapper to clean up sandbox and resources.""" + # Prevent double cleanup + if self._cleaned_up: + return + + self._cleaned_up = True + self.logger.log(f"Cleaning up sandbox {self.sandbox_name}...", level=LogLevel.INFO) + + try: + self._delete_sandbox_sync() + except Exception as e: + # Log cleanup errors but don't raise - cleanup should be best-effort + self.logger.log(f"Cleanup error: {e}", level=LogLevel.INFO) + finally: + # Always clean up local references + if hasattr(self, "sandbox"): + del self.sandbox + self.logger.log("Blaxel sandbox cleanup completed", level=LogLevel.INFO) + + def __del__(self): + """Ensure cleanup on deletion.""" + try: + self.cleanup() + except Exception: + pass # Silently ignore errors during cleanup + + class WasmExecutor(RemotePythonExecutor): """ Remote Python code executor in a sandboxed WebAssembly environment powered by Pyodide and Deno. diff --git a/tests/test_remote_executors.py b/tests/test_remote_executors.py index d2c18d53a..d7a880012 100644 --- a/tests/test_remote_executors.py +++ b/tests/test_remote_executors.py @@ -10,7 +10,14 @@ from smolagents.default_tools import FinalAnswerTool, WikipediaSearchTool from smolagents.local_python_executor import CodeOutput from smolagents.monitoring import AgentLogger, LogLevel -from smolagents.remote_executors import DockerExecutor, E2BExecutor, ModalExecutor, RemotePythonExecutor, WasmExecutor +from smolagents.remote_executors import ( + BlaxelExecutor, + DockerExecutor, + E2BExecutor, + ModalExecutor, + RemotePythonExecutor, + WasmExecutor, +) from smolagents.utils import AgentError from .utils.markers import require_run_all @@ -547,3 +554,72 @@ def test_syntax_error_handling(self): with pytest.raises(AgentError) as excinfo: self.executor(code) assert "SyntaxError" in str(excinfo.value) + + +class TestBlaxelExecutorUnit: + def test_blaxel_executor_instantiation_without_blaxel_sdk(self): + """Test that BlaxelExecutor raises appropriate error when blaxel SDK is not installed.""" + logger = MagicMock() + with patch.dict("sys.modules", {"blaxel.core": None}): + with pytest.raises(ModuleNotFoundError) as excinfo: + BlaxelExecutor(additional_imports=[], logger=logger) + assert "Please install 'blaxel' extra" in str(excinfo.value) + + @patch("blaxel.core.SandboxInstance") + def test_blaxel_executor_instantiation_with_blaxel_sdk(self, mock_sandbox_instance): + """Test BlaxelExecutor instantiation with mocked Blaxel SDK.""" + logger = MagicMock() + mock_sandbox = MagicMock() + mock_sandbox_instance.create.return_value = mock_sandbox + + with patch("asyncio.run") as mock_asyncio_run: + mock_asyncio_run.return_value = mock_sandbox + executor = BlaxelExecutor(additional_imports=[], logger=logger) + + assert executor.sandbox_name == "smolagent-executor" + assert executor.image == "blaxel/prod-py-app:latest" + assert executor.memory == 4096 + assert executor.region is None + + @patch("blaxel.core.SandboxInstance") + def test_blaxel_executor_custom_parameters(self, mock_sandbox_instance): + """Test BlaxelExecutor with custom parameters.""" + logger = MagicMock() + mock_sandbox = MagicMock() + mock_sandbox_instance.create.return_value = mock_sandbox + + with patch("asyncio.run") as mock_asyncio_run: + mock_asyncio_run.return_value = mock_sandbox + executor = BlaxelExecutor( + additional_imports=["numpy"], + logger=logger, + sandbox_name="test-sandbox", + image="custom-image:latest", + memory=8192, + region="us-was-1", + ) + + assert executor.sandbox_name == "test-sandbox" + assert executor.image == "custom-image:latest" + assert executor.memory == 8192 + assert executor.region == "us-was-1" + + @patch("blaxel.core.SandboxInstance") + @patch("blaxel.core.client.api.compute.delete_sandbox") + def test_blaxel_executor_cleanup(self, mock_delete_sandbox, mock_sandbox_instance): + """Test BlaxelExecutor cleanup method.""" + logger = MagicMock() + mock_sandbox = MagicMock() + mock_sandbox_instance.create.return_value = mock_sandbox + + with patch("asyncio.run") as mock_asyncio_run: + mock_asyncio_run.return_value = mock_sandbox + executor = BlaxelExecutor(additional_imports=[], logger=logger) + + # Test cleanup + executor.cleanup() + + # Verify that delete_sandbox.sync was called + assert mock_delete_sandbox.sync.called + # Verify sandbox reference was cleaned up + assert not hasattr(executor, "sandbox")