Reproducible experiments for the Push0 paper evaluation section. These experiments measure:
- Latency Overhead - Dispatcher and collector latency at varying load
- Scalability - Throughput vs dispatcher count, queue depth impact
- Fault Tolerance - Recovery from crashes, task loss verification
# Setup (one-time)
make setup
# Run all experiments for paper
make run-all-experiments
# Or run individual experiments
make latency
make scalability
make fault-tolerance- Docker and Docker Compose
- Python 3.9+
- GitHub credentials (for building orchestrator image)
export GITHUB_USER=your-username
export GITHUB_TOKEN=your-tokenMeasures orchestration overhead by timing:
- Dispatch latency: Task enqueue → dispatcher completion
- Collection latency: Result publication → aggregation
- End-to-end latency: Full pipeline
# Basic latency test
make latency NUM_TASKS=1000
# Test at varying injection rates (for CDF plot)
make latency-vary-rate NUM_TASKS=500Output:
results/latency_*.json- Raw latency data with P50/P95/P99 stats- CDF data for plotting latency distributions
Measures throughput scaling with dispatcher instances.
# Test 1, 2, 4, 8 dispatchers
make scalability DISPATCHER_COUNTS=1,2,4,8 NUM_TASKS=10000
# Test queue depth impact on NATS
make scalability-queue-depthOutput:
results/scalability_*.json- Throughput and memory per dispatcher count- Scaling efficiency calculations
Validates zero task loss under failure conditions.
# Dispatcher crash at 1%, 5%, 10% completion
make fault-dispatcher CRASH_RATES=1,5,10 NUM_TASKS=1000
# Collector crash mid-aggregation
make fault-collector NUM_TASKS=1000
# NATS network partition (30s)
make fault-partition
# Compare ACK timeout settings (10s, 30s, 60s)
make fault-ack-timeoutOutput:
results/fault_tolerance_*.json- Recovery times, task loss counts- Baseline comparison for overhead calculation
All results are saved as JSON in results/:
{
"experiment_type": "latency",
"config": {...},
"dispatch_stats": {
"count": 1000,
"p50_ms": 4.5,
"p95_ms": 8.2,
"p99_ms": 12.1,
"mean_ms": 5.1,
"stddev_ms": 2.3
},
"throughput_tasks_per_sec": 150.0,
"timestamp": "2024-01-15T10:30:00"
}experiments/
├── docker-compose.experiments.yml # Experiment infrastructure
├── Makefile # Easy experiment execution
├── configs/
│ └── prometheus.yml # Metrics collection
├── scripts/
│ ├── utils.py # Shared utilities
│ ├── latency_experiment.py # Latency measurement
│ ├── scalability_experiment.py # Scalability testing
│ └── fault_tolerance_experiment.py # Fault injection
└── results/ # Experiment outputs (JSON)
-
Echo Executor Mode: Uses
--features scroll-executor-echoto simulate prover execution without real proving. This isolates orchestration overhead. -
Memory-backed NATS: Uses in-memory storage for faster experiments. Production uses file-backed storage.
-
Reproducibility: All experiments can be run with
make run-all-experimentsfor consistent paper results. -
Prometheus Integration: Metrics are scraped at 500ms intervals for fine-grained latency data.
# View live logs
make logs
# Container status
make status
# NATS metrics
make nats-info
# Prometheus UI (http://localhost:9091)
make prometheus# Stop containers
make down
# Full cleanup (containers, volumes)
make clean
# Clear results only
make clean-resultsNATS not ready:
make setup-infraBuild failures:
# Ensure GitHub credentials are set
export GITHUB_USER=xxx
export GITHUB_TOKEN=xxx
make buildPython dependencies:
make setup-venv