feat: add data sandbox mode for isolated file operations by lxasqjc · Pull Request #235 · snap-stanford/Biomni

lxasqjc · 2025-10-06T09:05:31Z

Add sandbox_mode and sandbox_path parameters to A1 constructor
Enable automatic session folder creation when sandbox_mode=True
Modify run_python_repl to support working directory changes
Add get_sandbox_path() method for sandbox path retrieval
Include comprehensive documentation and examples
Update .gitignore to exclude sandbox directories

This feature allows users to isolate file operations in dedicated sandbox directories, preventing clutter in the main workspace and enabling easy cleanup of generated files.

- Add sandbox_mode and sandbox_path parameters to A1 constructor - Enable automatic session folder creation when sandbox_mode=True - Modify run_python_repl to support working directory changes - Add get_sandbox_path() method for sandbox path retrieval - Include comprehensive documentation and examples - Update .gitignore to exclude sandbox directories This feature allows users to isolate file operations in dedicated sandbox directories, preventing clutter in the main workspace and enabling easy cleanup of generated files.

lxasqjc · 2025-10-06T10:07:53Z

🎯 Overview

This PR introduces sandbox mode functionality to Biomni, enabling isolated workspace management for file operations during agent execution. This addresses the need for clean, isolated environments when the agent creates files, plots, or other outputs during exploration and analysis tasks.

✨ Features

🔒 Sandbox Mode

Isolated Workspaces: All file operations happen in dedicated sandbox directories
Auto-generated Sessions: Creates timestamped session folders automatically
Custom Paths: Support for user-defined sandbox locations
Zero Impact: Completely backward compatible - existing code unchanged

🛠️ API Enhancements

New sandbox_mode: bool = False parameter in A1.__init__()
New sandbox_path: str | None = None parameter for custom locations
New get_sandbox_path() -> str | None method to retrieve active sandbox path
Enhanced configuration display showing sandbox status

📋 Use Cases

Research & Experimentation

# Perfect for exploratory data analysis
agent = A1(path='./data', sandbox_mode=True)
agent.go("Analyze this dataset and create visualizations...")
# All plots, CSV files, results saved to isolated session folder

Automated Workflows

# Clean environment for each analysis run
agent = A1(
    path='./data', 
    sandbox_mode=True,
    sandbox_path=f'/tmp/analysis_{datetime.now().strftime("%Y%m%d")}'
)

Safe Development

# Test new analysis code without cluttering workspace
agent = A1(path='./data', sandbox_mode=True)
agent.go("Try this experimental analysis approach...")
# No risk of overwriting important files

🚀 Usage Examples

Example 1: Default Sandbox Mode

from biomni.agent import A1

# Enable sandbox with auto-generated session folder
agent = A1(
    path='./data', 
    sandbox_mode=True,
    commercial_mode=True
)

# All file operations will happen in: ./sandbox/session_YYYYMMDD_HHMMSS/
result = agent.go("""
Create a data analysis report with visualizations.

```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
data = np.random.randn(1000, 2)
df = pd.DataFrame(data, columns=['Feature1', 'Feature2'])

# Create analysis
summary = df.describe()
print("Dataset Summary:")
print(summary)

# Save summary to CSV
summary.to_csv('data_summary.csv')

# Create visualization
plt.figure(figsize=(10, 6))
plt.scatter(df['Feature1'], df['Feature2'], alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2') 
plt.title('Feature Correlation Analysis')
plt.savefig('correlation_plot.png', dpi=300, bbox_inches='tight')

# List created files
import os
files = os.listdir('.')
print(f"Files created: {files}")

""")

Check where files were saved

sandbox_path = agent.get_sandbox_path()
print(f"All outputs saved to: {sandbox_path}")


### Example 2: Custom Sandbox Path
```python  
from biomni.agent import A1
from datetime import datetime

# Use custom sandbox directory
project_name = "protein_analysis"
timestamp = datetime.now().strftime("%Y%m%d_%H%M")
custom_sandbox = f"/tmp/biomni_workspaces/{project_name}_{timestamp}"

agent = A1(
    path='./data',
    sandbox_mode=True,
    sandbox_path=custom_sandbox,
    commercial_mode=True
)

result = agent.go("""
Analyze protein sequences and create a phylogenetic tree.

```python
# Protein analysis code here...
# All outputs will be saved to /tmp/biomni_workspaces/protein_analysis_YYYYMMDD_HHMM/

""")


### Example 3: Batch Processing with Isolation
```python
from biomni.agent import A1
import os

datasets = ['dataset_A', 'dataset_B', 'dataset_C']

for dataset in datasets:
    # Each dataset gets its own sandbox
    agent = A1(
        path='./data',
        sandbox_mode=True, 
        sandbox_path=f'./results/{dataset}_analysis'
    )
    
    result = agent.go(f"""
    Analyze {dataset} and create comprehensive report:
    
    ```python
    # Load dataset {dataset}
    # Perform analysis
    # Save results to CSV
    # Create plots
    # Generate report
    ```
    """)
    
    print(f"✅ {dataset} analysis completed in: {agent.get_sandbox_path()}")

🔧 Implementation Details

Core Changes

biomni/agent/a1.py: Added sandbox parameters and logic to A1 class
biomni/tool/support_tools.py: Enhanced run_python_repl() to support working directory changes
docs/SANDBOX_EXAMPLE.md: Comprehensive documentation and examples

Technical Features

Thread-safe: Each agent instance has isolated sandbox
Cross-platform: Works on Linux, macOS, Windows
Automatic cleanup: Easy to identify and remove old sandbox directories
Error handling: Graceful fallback if sandbox creation fails

Backward Compatibility

✅ Default behavior unchanged (sandbox_mode=False)
✅ Existing code requires no modifications
✅ All existing tests pass
✅ No breaking changes to API

📊 Configuration Display

When sandbox mode is enabled, the agent configuration shows:

==================================================
🔧 BIOMNI CONFIGURATION  
==================================================
📋 DEFAULT CONFIG (Including Database LLM):
  Path: ./data
  Timeout Seconds: 600
  Llm: claude-sonnet-4-20250514
  Commercial Mode: Commercial (licensed datasets only)

📁 SANDBOX MODE:
  Enabled: True
  Sandbox Path: /absolute/path/to/sandbox/session_20251006_143022
  Files will be created in: /absolute/path/to/sandbox/session_20251006_143022
==================================================

🧪 Testing

The implementation has been tested with:

✅ Auto-generated sandbox folders
✅ Custom sandbox paths
✅ File creation and manipulation
✅ Plot generation and saving
✅ CSV/Excel output files
✅ Working directory persistence
✅ Error handling and recovery

📝 Benefits

For Users

Clean Workspaces: No more cluttered directories with analysis outputs
Reproducible Research: Each analysis run is isolated
Easy Cleanup: Delete sandbox folder when done
Safe Experimentation: Test code without affecting main workspace

For Developers

Minimal Changes: Small, focused implementation
Maintainable: Clear separation of concerns
Extensible: Easy to add more workspace management features
Well-documented: Comprehensive examples and documentation

🔄 Migration Guide

No migration needed! This is a purely additive feature.

To start using sandbox mode:

# Before (still works)
agent = A1(path='./data')

# After (new capability)  
agent = A1(path='./data', sandbox_mode=True)

📁 Files Changed

biomni/agent/a1.py - Core sandbox functionality
biomni/tool/support_tools.py - Working directory support
docs/SANDBOX_EXAMPLE.md - Documentation and examples
.gitignore - Ignore sandbox directories

This feature enables cleaner, more organized biomedical research workflows while maintaining full backward compatibility with existing Biomni usage patterns.

- Add automatic symbolic links to project data directories - Inject helper functions for absolute path access - Enhance configuration display with data access information - Ensure backward compatibility with existing relative paths Fixes issue where sandbox mode made project data inaccessible, causing FileNotFoundError for paths like './data/biomni_data/...' Now provides multiple data access methods: 1. Relative paths via automatic symlinks 2. get_project_path() helper function 3. Environment variables (__original_cwd__, etc.) All output files remain isolated in sandbox while maintaining seamless access to project data for analysis.

for more information, see https://pre-commit.ci

lxasqjc force-pushed the data-sandbox branch from 0c88fcd to b67506c Compare October 6, 2025 12:49

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca2953d

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add data sandbox mode for isolated file operations#235

feat: add data sandbox mode for isolated file operations#235
lxasqjc wants to merge 3 commits intosnap-stanford:mainfrom
lxasqjc:data-sandbox

lxasqjc commented Oct 6, 2025

Uh oh!

lxasqjc commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

lxasqjc commented Oct 6, 2025

Uh oh!

lxasqjc commented Oct 6, 2025

🎯 Overview

✨ Features

🔒 Sandbox Mode

🛠️ API Enhancements

📋 Use Cases

Research & Experimentation

Automated Workflows

Safe Development

🚀 Usage Examples

Example 1: Default Sandbox Mode

Check where files were saved

🔧 Implementation Details

Core Changes

Technical Features

Backward Compatibility

📊 Configuration Display

🧪 Testing

📝 Benefits

For Users

For Developers

🔄 Migration Guide

📁 Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant