Skip to content

Comments

feat: add data sandbox mode for isolated file operations#235

Open
lxasqjc wants to merge 3 commits intosnap-stanford:mainfrom
lxasqjc:data-sandbox
Open

feat: add data sandbox mode for isolated file operations#235
lxasqjc wants to merge 3 commits intosnap-stanford:mainfrom
lxasqjc:data-sandbox

Conversation

@lxasqjc
Copy link
Contributor

@lxasqjc lxasqjc commented Oct 6, 2025

  • Add sandbox_mode and sandbox_path parameters to A1 constructor
  • Enable automatic session folder creation when sandbox_mode=True
  • Modify run_python_repl to support working directory changes
  • Add get_sandbox_path() method for sandbox path retrieval
  • Include comprehensive documentation and examples
  • Update .gitignore to exclude sandbox directories

This feature allows users to isolate file operations in dedicated sandbox directories, preventing clutter in the main workspace and enabling easy cleanup of generated files.

- Add sandbox_mode and sandbox_path parameters to A1 constructor
- Enable automatic session folder creation when sandbox_mode=True
- Modify run_python_repl to support working directory changes
- Add get_sandbox_path() method for sandbox path retrieval
- Include comprehensive documentation and examples
- Update .gitignore to exclude sandbox directories

This feature allows users to isolate file operations in dedicated
sandbox directories, preventing clutter in the main workspace and
enabling easy cleanup of generated files.
@lxasqjc
Copy link
Contributor Author

lxasqjc commented Oct 6, 2025

🎯 Overview

This PR introduces sandbox mode functionality to Biomni, enabling isolated workspace management for file operations during agent execution. This addresses the need for clean, isolated environments when the agent creates files, plots, or other outputs during exploration and analysis tasks.

✨ Features

🔒 Sandbox Mode

  • Isolated Workspaces: All file operations happen in dedicated sandbox directories
  • Auto-generated Sessions: Creates timestamped session folders automatically
  • Custom Paths: Support for user-defined sandbox locations
  • Zero Impact: Completely backward compatible - existing code unchanged

🛠️ API Enhancements

  • New sandbox_mode: bool = False parameter in A1.__init__()
  • New sandbox_path: str | None = None parameter for custom locations
  • New get_sandbox_path() -> str | None method to retrieve active sandbox path
  • Enhanced configuration display showing sandbox status

📋 Use Cases

Research & Experimentation

# Perfect for exploratory data analysis
agent = A1(path='./data', sandbox_mode=True)
agent.go("Analyze this dataset and create visualizations...")
# All plots, CSV files, results saved to isolated session folder

Automated Workflows

# Clean environment for each analysis run
agent = A1(
    path='./data', 
    sandbox_mode=True,
    sandbox_path=f'/tmp/analysis_{datetime.now().strftime("%Y%m%d")}'
)

Safe Development

# Test new analysis code without cluttering workspace
agent = A1(path='./data', sandbox_mode=True)
agent.go("Try this experimental analysis approach...")
# No risk of overwriting important files

🚀 Usage Examples

Example 1: Default Sandbox Mode

from biomni.agent import A1

# Enable sandbox with auto-generated session folder
agent = A1(
    path='./data', 
    sandbox_mode=True,
    commercial_mode=True
)

# All file operations will happen in: ./sandbox/session_YYYYMMDD_HHMMSS/
result = agent.go("""
Create a data analysis report with visualizations.

```python
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
data = np.random.randn(1000, 2)
df = pd.DataFrame(data, columns=['Feature1', 'Feature2'])

# Create analysis
summary = df.describe()
print("Dataset Summary:")
print(summary)

# Save summary to CSV
summary.to_csv('data_summary.csv')

# Create visualization
plt.figure(figsize=(10, 6))
plt.scatter(df['Feature1'], df['Feature2'], alpha=0.6)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2') 
plt.title('Feature Correlation Analysis')
plt.savefig('correlation_plot.png', dpi=300, bbox_inches='tight')

# List created files
import os
files = os.listdir('.')
print(f"Files created: {files}")

""")

Check where files were saved

sandbox_path = agent.get_sandbox_path()
print(f"All outputs saved to: {sandbox_path}")


### Example 2: Custom Sandbox Path
```python  
from biomni.agent import A1
from datetime import datetime

# Use custom sandbox directory
project_name = "protein_analysis"
timestamp = datetime.now().strftime("%Y%m%d_%H%M")
custom_sandbox = f"/tmp/biomni_workspaces/{project_name}_{timestamp}"

agent = A1(
    path='./data',
    sandbox_mode=True,
    sandbox_path=custom_sandbox,
    commercial_mode=True
)

result = agent.go("""
Analyze protein sequences and create a phylogenetic tree.

```python
# Protein analysis code here...
# All outputs will be saved to /tmp/biomni_workspaces/protein_analysis_YYYYMMDD_HHMM/

""")


### Example 3: Batch Processing with Isolation
```python
from biomni.agent import A1
import os

datasets = ['dataset_A', 'dataset_B', 'dataset_C']

for dataset in datasets:
    # Each dataset gets its own sandbox
    agent = A1(
        path='./data',
        sandbox_mode=True, 
        sandbox_path=f'./results/{dataset}_analysis'
    )
    
    result = agent.go(f"""
    Analyze {dataset} and create comprehensive report:
    
    ```python
    # Load dataset {dataset}
    # Perform analysis
    # Save results to CSV
    # Create plots
    # Generate report
    ```
    """)
    
    print(f"✅ {dataset} analysis completed in: {agent.get_sandbox_path()}")

🔧 Implementation Details

Core Changes

  • biomni/agent/a1.py: Added sandbox parameters and logic to A1 class
  • biomni/tool/support_tools.py: Enhanced run_python_repl() to support working directory changes
  • docs/SANDBOX_EXAMPLE.md: Comprehensive documentation and examples

Technical Features

  • Thread-safe: Each agent instance has isolated sandbox
  • Cross-platform: Works on Linux, macOS, Windows
  • Automatic cleanup: Easy to identify and remove old sandbox directories
  • Error handling: Graceful fallback if sandbox creation fails

Backward Compatibility

  • ✅ Default behavior unchanged (sandbox_mode=False)
  • ✅ Existing code requires no modifications
  • ✅ All existing tests pass
  • ✅ No breaking changes to API

📊 Configuration Display

When sandbox mode is enabled, the agent configuration shows:

==================================================
🔧 BIOMNI CONFIGURATION  
==================================================
📋 DEFAULT CONFIG (Including Database LLM):
  Path: ./data
  Timeout Seconds: 600
  Llm: claude-sonnet-4-20250514
  Commercial Mode: Commercial (licensed datasets only)

📁 SANDBOX MODE:
  Enabled: True
  Sandbox Path: /absolute/path/to/sandbox/session_20251006_143022
  Files will be created in: /absolute/path/to/sandbox/session_20251006_143022
==================================================

🧪 Testing

The implementation has been tested with:

  • ✅ Auto-generated sandbox folders
  • ✅ Custom sandbox paths
  • ✅ File creation and manipulation
  • ✅ Plot generation and saving
  • ✅ CSV/Excel output files
  • ✅ Working directory persistence
  • ✅ Error handling and recovery

📝 Benefits

For Users

  • Clean Workspaces: No more cluttered directories with analysis outputs
  • Reproducible Research: Each analysis run is isolated
  • Easy Cleanup: Delete sandbox folder when done
  • Safe Experimentation: Test code without affecting main workspace

For Developers

  • Minimal Changes: Small, focused implementation
  • Maintainable: Clear separation of concerns
  • Extensible: Easy to add more workspace management features
  • Well-documented: Comprehensive examples and documentation

🔄 Migration Guide

No migration needed! This is a purely additive feature.

To start using sandbox mode:

# Before (still works)
agent = A1(path='./data')

# After (new capability)  
agent = A1(path='./data', sandbox_mode=True)

📁 Files Changed

  • biomni/agent/a1.py - Core sandbox functionality
  • biomni/tool/support_tools.py - Working directory support
  • docs/SANDBOX_EXAMPLE.md - Documentation and examples
  • .gitignore - Ignore sandbox directories

This feature enables cleaner, more organized biomedical research workflows while maintaining full backward compatibility with existing Biomni usage patterns.

- Add automatic symbolic links to project data directories
- Inject helper functions for absolute path access
- Enhance configuration display with data access information
- Ensure backward compatibility with existing relative paths

Fixes issue where sandbox mode made project data inaccessible,
causing FileNotFoundError for paths like './data/biomni_data/...'

Now provides multiple data access methods:
1. Relative paths via automatic symlinks
2. get_project_path() helper function
3. Environment variables (__original_cwd__, etc.)

All output files remain isolated in sandbox while maintaining
seamless access to project data for analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant