Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 1 addition & 6 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Outline Benchmarks
name: Grainchain Benchmarks

on:
schedule:
Expand Down Expand Up @@ -37,9 +37,6 @@ jobs:
run: |
uv sync --all-extras

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Configure Git
run: |
git config --global user.name "Benchmark Bot"
Expand All @@ -48,8 +45,6 @@ jobs:
- name: Run benchmarks
run: |
uv run python benchmarks/scripts/auto_publish.py --run-benchmark
env:
DOCKER_HOST: unix:///var/run/docker.sock

- name: Generate summary report
run: |
Expand Down
70 changes: 36 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,62 +105,64 @@ Compare sandbox providers with comprehensive performance testing:

```bash
# Test individual providers
grainchain benchmark --provider local
grainchain benchmark --provider e2b
grainchain benchmark --provider daytona
grainchain benchmark --provider morph
python benchmarks/scripts/grainchain_benchmark.py --providers local
python benchmarks/scripts/grainchain_benchmark.py --providers e2b
python benchmarks/scripts/grainchain_benchmark.py --providers daytona
python benchmarks/scripts/grainchain_benchmark.py --providers morph

# Generate timestamped results
grainchain benchmark --provider local --output benchmarks/results/
# Test multiple providers at once
python benchmarks/scripts/grainchain_benchmark.py --providers local e2b --iterations 3

# Check latest benchmark status (without running new tests)
./scripts/benchmark_status.sh
# Generate automated summary report
python benchmarks/scripts/auto_publish.py --generate-summary
```

### Full Benchmark Suite

Run comprehensive benchmarks across all providers:
Run comprehensive benchmarks across all available providers:

```bash
# Quick: Run all providers and save results
for provider in local e2b daytona morph; do
echo "πŸš€ Testing $provider..."
grainchain benchmark --provider $provider --output benchmarks/results/
done
# Run full benchmark suite with all providers
python benchmarks/scripts/grainchain_benchmark.py --providers local e2b modal daytona morph --iterations 3

# Comprehensive: Generate a full report that can be committed
./scripts/benchmark_all.sh
# Run automated benchmark and generate summary (used by CI)
python benchmarks/scripts/auto_publish.py --run-benchmark

# Advanced: Use the detailed benchmark script
./benchmarks/scripts/run_grainchain_benchmark.sh "local e2b daytona morph" 3
# Generate summary from existing results
python benchmarks/scripts/auto_publish.py --generate-summary
```

The `benchmark_all.sh` script generates timestamped reports in `benchmarks/results/` that include:
The benchmark system generates timestamped reports in `benchmarks/results/` that include:

- Performance comparison tables
- Environment details (OS, commit hash)
- Analysis and recommendations
- Raw benchmark data for tracking trends
- Performance comparison tables across providers
- Success rates and error analysis
- Detailed metrics for each test scenario
- JSON data for historical tracking
- Automated summary reports

### Current Performance Baseline

Latest benchmark results (updated 2024-05-31):
Latest benchmark results (updated 2025-07-06):

| Provider | Total Time | Basic Echo | Python Test | File Ops | Performance |
| ----------- | ---------- | ---------- | ----------- | -------- | ---------------- |
| **Local** | 0.036s | 0.007s | 0.021s | 0.008s | ⚑ Fastest |
| **E2B** | 0.599s | 0.331s | 0.111s | 0.156s | πŸš€ Balanced |
| **Daytona** | 1.012s | 0.305s | 0.156s | 0.551s | πŸ›‘οΈ Comprehensive |
| **Morph** | 0.250s | 0.005s | 0.010s | 0.005s | πŸš€ Instant Snapshots |
| Provider | Success Rate | Avg Time (s) | Status | Performance |
|----------|--------------|--------------|--------|-------------|
| **Local** | 76.7% | 1.09 | βœ… Available | ⚑ Fastest |
| **E2B** | - | - | ❓ Not tested | πŸš€ Cloud-based |
| **Daytona** | - | - | ❓ Not tested | πŸ›‘οΈ Comprehensive |
| **Morph** | - | - | ❌ Payment required | πŸš€ Instant Snapshots |

> **Performance Notes**:
>
> - Local: Best for development/testing (17x faster than E2B, 28x faster than Daytona)
> - E2B: Production-ready with good speed and reliability
> - Daytona: Full workspace environments with comprehensive tooling
> - Morph: Custom base images, instant snapshots, <250ms startup
> - **Local**: Best for development/testing, fastest execution, 76.7% success rate
> - **E2B**: Production-ready cloud sandboxes (requires API key setup)
> - **Daytona**: Full workspace environments with comprehensive tooling
> - **Morph**: Custom base images with instant snapshots (requires paid plan)
>
> Success rates reflect the percentage of test scenarios that complete successfully.
> The Local provider shows 76.7% due to snapshot restoration limitations in the current test.

Results are automatically saved to `benchmarks/results/` and can be committed to track performance over time.
View the full benchmark summary at [`benchmarks/results/SUMMARY.md`](benchmarks/results/SUMMARY.md).

## 🎯 Why Grainchain?

Expand Down
39 changes: 39 additions & 0 deletions benchmarks/results/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Grainchain Benchmark Summary

**Last Updated:** 2025-07-06 20:49:29
**Total Benchmark Runs:** 1

## Recent Results

| Date | Status | Success Rate | Avg Time (s) | Providers | Notes |
|------|--------|--------------|--------------|-----------|-------|
| 2025-07-06 | βœ… | 76.7% | 1.09 | local | OK |

## Configuration

The benchmarks use the following configuration:
- **Providers:** Local, E2B, Modal, Daytona, Morph (when available)
- **Test Scenarios:** Basic commands, Python execution, File operations, Computational tasks, Snapshot lifecycle
- **Default Iterations:** 3
- **Timeout:** 30 seconds per scenario

## Metrics Collected

- **Sandbox Creation Time:** Time to create a new sandbox
- **Command Execution Time:** Time to execute individual commands
- **Success Rate:** Percentage of successful operations
- **File Operations:** Upload/download performance
- **Snapshot Lifecycle:** Git clone, snapshot creation, and restoration

## Test Scenarios

1. **Basic Commands:** Shell commands (echo, pwd, ls, whoami, date)
2. **Python Execution:** Python script execution and version checks
3. **File Operations:** File upload/download with various sizes
4. **Computational Tasks:** CPU-intensive Python operations
5. **Snapshot Lifecycle:** Git clone, file creation, snapshot, kill, and restore

## Automation

This summary is automatically updated when new benchmark results are available.
Results are committed to the repository for historical tracking.
70 changes: 70 additions & 0 deletions benchmarks/results/grainchain_benchmark_20250706_204709.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Grainchain Provider Benchmark Report

**Generated:** 2025-07-06T20:47:04.074559
**Duration:** 5.44 seconds
**Providers Tested:** local
**Test Scenarios:** 5

## Executive Summary

| Provider | Success Rate | Avg Time (s) | Creation Time (s) | Status |
|----------|--------------|--------------|-------------------|--------|
| local | 76.7% | 1.09 | 0.00 | ⚠️ |

## πŸ† Best Performers

- **Most Reliable:** local
- **Fastest Execution:** local
- **Fastest Startup:** local

## Detailed Results

### LOCAL Provider

- **Overall Success Rate:** 76.7%
- **Average Scenario Time:** 1.09s
- **Average Creation Time:** 0.00s

#### Basic Commands
- **Success Rate:** 100.0%
- **Average Time:** 0.02s
- **Iterations:** 1/1

#### Python Execution
- **Success Rate:** 100.0%
- **Average Time:** 0.07s
- **Iterations:** 1/1

#### File Operations
- **Success Rate:** 0.0%
- **Average Time:** 0.00s
- **Iterations:** 1/1

#### Computational Tasks
- **Success Rate:** 100.0%
- **Average Time:** 0.06s
- **Iterations:** 1/1

#### Snapshot Lifecycle
- **Success Rate:** 83.3%
- **Average Time:** 5.27s
- **Iterations:** 1/1

## Configuration

```json
{
"providers": [
"local"
],
"iterations": 1,
"timeout": 30,
"parallel_tests": false,
"detailed_metrics": true,
"export_formats": [
"json",
"markdown",
"html"
]
}
```
70 changes: 70 additions & 0 deletions benchmarks/results/grainchain_benchmark_20250706_204945.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Grainchain Provider Benchmark Report

**Generated:** 2025-07-06T20:49:45.139726
**Duration:** 0.50 seconds
**Providers Tested:** local
**Test Scenarios:** 5

## Executive Summary

| Provider | Success Rate | Avg Time (s) | Creation Time (s) | Status |
|----------|--------------|--------------|-------------------|--------|
| local | 73.3% | 0.03 | 0.00 | ⚠️ |

## πŸ† Best Performers

- **Most Reliable:** local
- **Fastest Execution:** local
- **Fastest Startup:** local

## Detailed Results

### LOCAL Provider

- **Overall Success Rate:** 73.3%
- **Average Scenario Time:** 0.03s
- **Average Creation Time:** 0.00s

#### Basic Commands
- **Success Rate:** 100.0%
- **Average Time:** 0.01s
- **Iterations:** 3/3

#### Python Execution
- **Success Rate:** 100.0%
- **Average Time:** 0.07s
- **Iterations:** 3/3

#### File Operations
- **Success Rate:** 0.0%
- **Average Time:** 0.00s
- **Iterations:** 3/3

#### Computational Tasks
- **Success Rate:** 100.0%
- **Average Time:** 0.07s
- **Iterations:** 3/3

#### Snapshot Lifecycle
- **Success Rate:** 66.7%
- **Average Time:** 0.01s
- **Iterations:** 3/3

## Configuration

```json
{
"providers": [
"local"
],
"iterations": 3,
"timeout": 30,
"parallel_tests": false,
"detailed_metrics": true,
"export_formats": [
"json",
"markdown",
"html"
]
}
```
44 changes: 17 additions & 27 deletions benchmarks/results/latest_grainchain.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Grainchain Provider Benchmark Report

**Generated:** 2025-06-04T02:53:12.502516
**Duration:** 5.69 seconds
**Providers Tested:** local, morph
**Generated:** 2025-07-06T20:49:45.139726
**Duration:** 0.50 seconds
**Providers Tested:** local
**Test Scenarios:** 5

## Executive Summary

| Provider | Success Rate | Avg Time (s) | Creation Time (s) | Status |
|----------|--------------|--------------|-------------------|--------|
| local | 76.7% | 1.07 | 0.00 | ⚠️ |
| local | 73.3% | 0.03 | 0.00 | ⚠️ |

## πŸ† Best Performers

Expand All @@ -21,53 +21,43 @@

### LOCAL Provider

- **Overall Success Rate:** 76.7%
- **Average Scenario Time:** 1.07s
- **Overall Success Rate:** 73.3%
- **Average Scenario Time:** 0.03s
- **Average Creation Time:** 0.00s

#### Basic Commands
- **Success Rate:** 100.0%
- **Average Time:** 0.02s
- **Iterations:** 1/1
- **Average Time:** 0.01s
- **Iterations:** 3/3

#### Python Execution
- **Success Rate:** 100.0%
- **Average Time:** 0.07s
- **Iterations:** 1/1
- **Iterations:** 3/3

#### File Operations
- **Success Rate:** 0.0%
- **Average Time:** 0.00s
- **Iterations:** 1/1
- **Iterations:** 3/3

#### Computational Tasks
- **Success Rate:** 100.0%
- **Average Time:** 0.06s
- **Iterations:** 1/1
- **Average Time:** 0.07s
- **Iterations:** 3/3

#### Snapshot Lifecycle
- **Success Rate:** 83.3%
- **Average Time:** 5.22s
- **Iterations:** 1/1

### MORPH Provider

❌ **Status:** unavailable
**Error:** Failed to create sandbox: Failed to create sandbox: Morph authentication failed: HTTP Error 402 for url 'https://cloud.morph.so/api/snapshot'
Status Code: 402
Response Body: {
"detail": "Payment required"
}
- **Success Rate:** 66.7%
- **Average Time:** 0.01s
- **Iterations:** 3/3

## Configuration

```json
{
"providers": [
"local",
"morph"
"local"
],
"iterations": 1,
"iterations": 3,
"timeout": 30,
"parallel_tests": false,
"detailed_metrics": true,
Expand Down
Loading