Comprehensive performance testing suite for the CoW Protocol Playground, enabling load testing, benchmarking, and regression detection using Anvil fork mode.
- Load Generation: Simulate realistic trading patterns with configurable strategies
- Performance Benchmarking: Measure order lifecycle, API performance, and resource utilization
- Metrics & Visualization: Prometheus exporters and Grafana dashboards
- Regression Detection: Statistical comparison against baselines
- Fork Mode Testing: Test against mainnet state using Anvil fork mode
- Scenario Library: Predefined scenarios from light to heavy loads
- Flexible Configuration: YAML-based scenarios with inheritance and composition
- Python 3.11+
- Poetry (for dependency management)
- Docker and Docker Compose
- Ethereum RPC URL (Alchemy, Infura, etc.)
-
Clone and install
git clone https://github.com/cowprotocol/cow-performance-testing-suite.git cd cow-performance-testing-suite poetry install && poetry shell
Alternative without Poetry:
python3 -m venv .venv && source .venv/bin/activate && pip install -e .
-
Configure environment
cp .env.example .env # Edit .env and set: ETH_RPC_URL=https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY -
Start services
docker compose up -d
Note: First startup may show "unhealthy" errors while orderbook compiles (takes 5-10 minutes). Check progress:
docker compose logs -f orderbook -
Verify installation
cow-perf version
-
Run your first test
# Quick 2-minute regression test cow-perf run --config configs/scenarios/enhanced/regression-test.yml
The suite includes 5 production-ready scenarios with automated validation:
| Scenario | Duration | Purpose | Success Criteria |
|---|---|---|---|
| regression-test | 2 min | Fast CI/CD regression testing | ≥90% success, <15s P95 latency |
| sustained-load | 30 min | Long-term stability, memory leak detection | ≥80% success, <25s P95 latency |
| large-orders | 5 min | Edge case testing with whale trades (100+ ETH) | ≥70% success, <40s P95 latency |
| high-frequency | 3 min | Extreme stress test at 100 orders/sec | ≥60% success, <50s P95 latency |
| limit-orders-only | 10 min | Orderbook-focused testing (100% limit orders) | ≥75% success, <30s P95 latency |
See detailed docs: Regression Test · Sustained Load · Large Orders · High Frequency · Limit Orders Only
Basic usage:
# Run a predefined scenario
cow-perf run --config configs/scenarios/enhanced/regression-test.yml
# Run with custom parameters
cow-perf run --traders 10 --duration 120 --settlement-wait 300Save results for later analysis:
# Save as baseline for comparison
cow-perf run --config configs/scenarios/enhanced/regression-test.yml \
--save-baseline "v1.0-regression" \
--baseline-description "CI/CD regression baseline"| Your Goal | Use This Scenario | Why |
|---|---|---|
| Quick verification | regression-test | Fast (2 min), catches regressions |
| CI/CD pipeline | regression-test | Reliable, automated validation |
| Pre-release check | sustained-load | Detects memory leaks, stability issues |
| Stress testing | high-frequency | Finds breaking points, rate limits |
| Edge cases | large-orders | Tests extreme order sizes |
| Orderbook testing | limit-orders-only | Tests matching engine |
Save performance baselines and generate comprehensive reports with regression detection.
Run a test and automatically save the results as a baseline for future comparisons:
# Run test and save as baseline
cow-perf run --config configs/scenarios/light-load.yml \
--save-baseline "v1.0" \
--baseline-description "Production baseline" \
--baseline-tags "production,release"Saved to: .cow-perf/baselines/{uuid}.json
Generate performance reports from saved baselines in multiple formats:
# Text report to console (default)
cow-perf report generate v1.0
# Save report to file (.cow-perf/reports/)
cow-perf report generate v1.0 --save
# Markdown report (GitHub-friendly)
cow-perf report generate v1.0 -f markdown --save
# JSON report (machine-readable)
cow-perf report generate v1.0 -f json --save
# With CSV exports
cow-perf report generate v1.0 --save --export-csvSaved to:
- Reports:
.cow-perf/reports/report-{baseline}-{timestamp}.{format} - CSV files:
.cow-perf/reports/csv/{baseline}/summary.csv,latencies.csv,recommendations.csv
Compare two baselines to detect performance regressions or improvements:
# Compare current against previous baseline
cow-perf report generate v2.0 --compare v1.0 --save
# With markdown format for GitHub PRs
cow-perf report generate v2.0 --compare v1.0 -f markdown --saveThe comparison report shows:
- ✅ Improvements: Metrics that got better
⚠️ Regressions: Metrics that got worse (with severity: minor/major/critical)- 📊 Percent changes: For all key metrics
- 🔧 Recommendations: Actionable insights based on the comparison
# List all saved baselines
cow-perf baselines --list
# Show detailed baseline info
cow-perf baselines --show v1.0
# Delete old baseline
cow-perf baselines --delete old-baselineAll solver containers are automatically tracked - no configuration needed!
The system uses pattern matching to discover containers:
- Any container with
solverin its name is tracked (e.g.,solver-baseline-1,solver-quasimodo-1) - Each solver gets separate resource metrics (CPU, memory, network I/O)
- Reports show per-solver performance
Supported solver types:
solver-baseline-*- Baseline solver instancessolver-quasimodo-*- Quasimodo solver instancessolver-{any-type}-*- Any other solver type
Adding more solvers:
Follow these steps to add a new solver (e.g., adding a 4th baseline solver or a new quasimodo solver):
-
Add solver service to
docker-compose.yml:# For a 4th baseline solver: solver-baseline-4: build: context: ./modules/services target: solvers command: ["baseline", "--config", "/baseline.toml"] volumes: - ./configs/baseline.toml:/baseline.toml:ro networks: - cownet # OR for a quasimodo solver: solver-quasimodo-1: build: context: ./modules/services target: solvers command: ["quasimodo", "--config", "/quasimodo.toml"] volumes: - ./configs/quasimodo.toml:/quasimodo.toml:ro networks: - cownet
-
Update autopilot environment variables in
docker-compose.yml:# Add new solver to the DRIVERS list: - DRIVERS=solver-baseline-1|http://driver/solver-baseline-1|${SOLVER_ADDRESS},solver-baseline-2|http://driver/solver-baseline-2|${SOLVER_ADDRESS},solver-baseline-3|http://driver/solver-baseline-3|${SOLVER_ADDRESS},solver-baseline-4|http://driver/solver-baseline-4|${SOLVER_ADDRESS} # Add to PRICE_ESTIMATION_DRIVERS: - PRICE_ESTIMATION_DRIVERS=solver-baseline-1|http://driver/solver-baseline-1,solver-baseline-2|http://driver/solver-baseline-2,solver-baseline-3|http://driver/solver-baseline-3,solver-baseline-4|http://driver/solver-baseline-4 # Add to NATIVE_PRICE_ESTIMATORS: - NATIVE_PRICE_ESTIMATORS=solver-baseline-1|http://driver/solver-baseline-1,solver-baseline-2|http://driver/solver-baseline-2,solver-baseline-3|http://driver/solver-baseline-3,solver-baseline-4|http://driver/solver-baseline-4
-
Update orderbook environment variables in
docker-compose.yml:# Add new solver to these same lists (same format as autopilot) - DRIVERS=... - PRICE_ESTIMATION_DRIVERS=... - NATIVE_PRICE_ESTIMATORS=...
-
Add solver configuration to
configs/driver.toml:[[solver]] name = "solver-baseline-4" endpoint = "http://solver-baseline-4" absolute-slippage = "40000000000000000" relative-slippage = "0.1" account = "0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80"
-
Build and start containers:
# Build new solver image (first time only) docker compose build solver-baseline-4 # Start all services docker compose up -d # Verify solver is running docker compose ps | grep solver-baseline-4 # Check solver logs docker compose logs -f solver-baseline-4
-
Verify driver can reach the solver:
# Check driver logs for successful solver mounting docker compose logs driver | grep "mounting solver" # Should show: mounting solver solver=solver-baseline-4 path="/solver-baseline-4"
-
Run a test to verify:
cow-perf run --config configs/scenarios/light-load.yml --duration 30
-
Check report - the new solver will automatically appear in resource metrics!
The system will automatically discover and track any container with solver in its name - no code changes needed!
Example report output:
Resource Utilization:
Container CPU(P95) Memory(P95)
-----------------------------------------------
solver-baseline-1 38.8% 11.0%
solver-baseline-2 43.8% 12.0%
solver-baseline-3 48.8% 13.0%
solver-quasimodo-1 35.2% 10.5%
solver-quasimodo-2 41.1% 11.8%
When comparing baselines, per-solver improvements/regressions are shown:
Improvements:
- resource_solver-baseline-1_cpu: -51.5% (improved)
- resource_solver-baseline-2_cpu: -9.1% (improved)
- resource_solver-quasimodo-1_cpu: -12.3% (improved)
# 1. Run initial test and save baseline
cow-perf run --config configs/scenarios/medium-load.yml \
--save-baseline "before-optimization" \
--baseline-description "Performance before optimization work"
# 2. Make code changes, run new test
cow-perf run --config configs/scenarios/medium-load.yml \
--save-baseline "after-optimization" \
--baseline-description "Performance after optimization"
# 3. Generate comparison report
cow-perf report generate after-optimization \
--compare before-optimization \
-f markdown \
--save \
--export-csv
# 4. View results
cat .cow-perf/reports/report-after-optimization-vs-before-optimization-*.mdAll files are saved in your project directory under .cow-perf/:
.cow-perf/
├── baselines/ # Saved performance baselines
├── reports/ # Generated reports
│ ├── report-*.txt
│ ├── report-*.md
│ ├── report-*.json
│ └── csv/ # CSV exports
│ └── {baseline}/
│ ├── summary.csv
│ ├── latencies.csv
│ └── recommendations.csv
└── results/ # Raw test results
See .cow-perf/README.md for detailed documentation on the data directory structure.
Prometheus metrics export is enabled by default (port 9091). To use the full monitoring stack:
-
Start Prometheus & Grafana
docker compose --profile monitoring up -d
-
Run a test (metrics export automatically on port 9091)
cow-perf run --config configs/scenarios/light-load.yml
-
View dashboards at http://localhost:3000 (default: admin/admin)
- Performance Overview
- API Performance
- Resources
- Comparison
- Trader Activity
-
Disable metrics export (if needed)
cow-perf run --config configs/scenarios/light-load.yml --prometheus-port 0
For detailed setup and troubleshooting, see Development Guide.
List all available scenarios:
# Show all scenarios with full metadata
cow-perf scenarios --dir configs/scenarios
# Simple view (basic info only)
cow-perf scenarios --dir configs/scenarios --simpleFilter by tags:
# Find regression tests
cow-perf scenarios --tag regression
# Find short-duration tests
cow-perf scenarios --tag short
# Multiple tags (AND logic) - find edge-case tests that are short
cow-perf scenarios --tag edge-case --tag shortSearch by text:
# Search in name, description, or tags (case-insensitive)
cow-perf scenarios --search "stability"
cow-perf scenarios --search "whale"
cow-perf scenarios --search "ci-cd"Before running a test, validate the scenario configuration:
cow-perf scenarios --validate configs/scenarios/enhanced/regression-test.ymlThis displays:
- ✓ Configuration is valid
- Basic properties (name, traders, duration, pattern)
- Scenario metadata (expected orders, resource requirements)
- Success criteria thresholds
- Order type distribution
Each scenario includes automated success criteria for pass/fail validation:
Four key metrics:
- Min Success Rate - Minimum percentage of orders that must fill successfully
- Max P95 Latency - Maximum acceptable 95th percentile latency
- Max Error Rate - Maximum percentage of orders that can fail
- Min Throughput - Minimum orders processed per second
Example: Regression Test Criteria
success_criteria:
min_success_rate: 0.90 # ≥90% orders must succeed
max_p95_latency_seconds: 15.0 # P95 latency must be ≤15s
max_error_rate: 0.10 # ≤10% orders can fail
min_throughput_per_second: 4.0 # Must process ≥4 orders/secProgrammatic validation:
from pathlib import Path
from cow_performance.cli.commands.scenarios import load_scenario_from_yaml
from cow_performance.scenarios import SuccessCriteriaValidator
# Load scenario
scenario = load_scenario_from_yaml(
Path('configs/scenarios/enhanced/regression-test.yml')
)
# Validate test results against criteria
validator = SuccessCriteriaValidator(scenario.success_criteria)
validation = validator.validate(
success_rate=0.95,
p95_latency_seconds=12.0,
error_rate=0.05,
throughput_per_second=5.0
)
if validation.passed:
print(f"✅ All {validation.total_checks} criteria passed!")
else:
print(f"❌ {len(validation.failures)} criteria failed:")
for failure in validation.failures:
print(f" - {failure.criterion}: {failure.message}")Create your own scenario YAML file:
name: my-custom-test
description: Custom test scenario
version: "1.0"
tags: [custom, testing]
# Metadata (optional but recommended)
metadata:
expected_orders: 300
expected_duration_seconds: 60
resource_requirements:
min_memory_gb: 2.0
min_cpu_cores: 2
recommended_memory_gb: 4.0
recommended_cpu_cores: 4
# Success criteria (optional)
success_criteria:
min_success_rate: 0.80
max_p95_latency_seconds: 20.0
max_error_rate: 0.20
min_throughput_per_second: 3.0
# Test configuration
num_traders: 10
duration: 60
trading_pattern: constant_rate
base_rate: 300.0 # orders per minute
# Order distribution
market_order_ratio: 0.6
limit_order_ratio: 0.4Then validate and run:
cow-perf scenarios --validate my-custom-test.yml
cow-perf run --config my-custom-test.ymlThe Docker environment is optimized to prevent excessive disk usage, but monitoring is still recommended:
- Chain container (Anvil): Uses
--prune-historyflag to keep state in process memory only (no disk accumulation) - Container logs: Limited to 10MB per file, max 3 files (30MB total per service)
- Prometheus data: Retention limited to 7 days and 1GB
- Rust build artifacts: Stored in Docker volumes (not on host disk)
# Check Docker disk usage
docker system df
# Monitor specific container disk usage
docker stats --no-streamQuick cleanup (recommended for regular use):
# Stop containers (preserves images and volumes)
docker compose down
# Remove stopped containers and unused images
docker system prune -fDeep cleanup (if disk space is critical):
# Use the automated cleanup script
./hack/cleanup-docker.sh
# Or manual cleanup with volumes (⚠️ data loss)
docker compose down -v
docker system prune -a -f --volumesAfter cleanup, restart services:
docker compose up -dNote: First startup after cleanup may be slower due to image rebuilding and database migrations.
| Topic | Document |
|---|---|
| CLI Reference | docs/cli.md |
| Development Guide | docs/development.md |
| Architecture | docs/architecture.md |
| Order Generation API | docs/order-generation.md |
| Conditional Orders | docs/conditional-orders.md |
| User Simulation | docs/user-simulation.md |
| Scenario Documentation | |
| Regression Test | docs/scenarios/regression-test.md |
| Sustained Load | docs/scenarios/sustained-load.md |
| Large Orders | docs/scenarios/large-orders.md |
| High Frequency | docs/scenarios/high-frequency.md |
| Limit Orders Only | docs/scenarios/limit-orders-only.md |
cow-performance-testing-suite/
├── src/cow_performance/ # Core modules
│ ├── cli/ # CLI commands (Typer)
│ ├── load_generation/ # Order generation, traders, Safe wallets
│ ├── benchmarking/ # Performance analysis
│ ├── metrics/ # Metrics collection
│ └── scenarios/ # Test scenarios
├── tests/ # Unit, integration, and E2E tests
├── configs/ # Configuration and scenario files
├── docs/ # Documentation
└── docker/ # Docker configuration
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting:
poetry run pytest && poetry run ruff check . && poetry run mypy . - Submit a pull request
- Milestone 1: Project Setup & Load Generation Framework
- Milestone 2: User Simulation Module (TraderPool, Safe wallets, hooks)
- Milestone 3: CLI Tool Interface
- Milestone 4: Performance Benchmarking & Metrics
- Milestone 5: Advanced Features & Documentation
MIT License - see LICENSE for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built with love by the CoW Protocol team for comprehensive performance testing of the CoW Protocol Playground.