|
1 | 1 | # Distributed Proving |
2 | 2 |
|
3 | | -Distributed proving allows running multiple prover instances in parallel, each working on different batches simultaneously. The proof coordinator assigns work to provers and collects their proofs, then the proof sender batches multiple consecutive proofs into a single L1 verification transaction. |
| 3 | +## Overview |
| 4 | + |
| 5 | +Distributed proving enables running multiple prover instances in parallel, each working on different batches simultaneously. It has two key aspects: |
| 6 | + |
| 7 | +1. **Parallel batch assignment**: the proof coordinator assigns different batches to different provers, so multiple provers work simultaneously. |
| 8 | +2. **Multi-batch verification**: the proof sender collects consecutive proven batches and submits them in a single `verifyBatches()` L1 transaction, saving gas. |
4 | 9 |
|
5 | 10 | ## Architecture |
6 | 11 |
|
@@ -28,9 +33,60 @@ Distributed proving allows running multiple prover instances in parallel, each w |
28 | 33 | └────────────┘ |
29 | 34 | ``` |
30 | 35 |
|
31 | | -Multiple provers connect to the same proof coordinator. The coordinator tracks assignments per `(batch_number, prover_type)`, so: |
| 36 | +Multiple provers connect to the same proof coordinator over TCP. The coordinator tracks assignments per `(batch_number, prover_type)`, so: |
| 37 | + |
32 | 38 | - Two `sp1` provers get assigned **different** batches. |
33 | | -- An `sp1` prover and an `risc0` prover can work on the **same** batch simultaneously (they produce different proof types). |
| 39 | +- An `sp1` prover and a `risc0` prover can work on the **same** batch simultaneously (they produce different proof types). |
| 40 | + |
| 41 | +## Batch assignment |
| 42 | + |
| 43 | +When a prover sends a `BatchRequest`, it includes its `prover_type`. The coordinator: |
| 44 | + |
| 45 | +1. Scans batches starting from the oldest unverified one. |
| 46 | +2. Skips batches that already have a proof for this `prover_type`. |
| 47 | +3. Skips batches currently assigned to another prover of the same type (unless the assignment has timed out). |
| 48 | +4. Assigns the first available batch and records `(batch_number, prover_type) → Instant::now()`. |
| 49 | + |
| 50 | +The assignment map is in-memory only — it is lost on restart. On restart, the coordinator simply reassigns batches from scratch, which is safe because storing a duplicate proof is a no-op. |
| 51 | + |
| 52 | +## Prover timeout |
| 53 | + |
| 54 | +If a prover doesn't submit a proof within `prover-timeout` (default 10 minutes), its assignment expires and the batch becomes available for reassignment to another prover. This handles prover crashes, network issues, or slow provers without manual intervention. |
| 55 | + |
| 56 | +## Multi-batch verification |
| 57 | + |
| 58 | +The proof sender runs on a periodic tick (every `send-interval` ms). On each tick it: |
| 59 | + |
| 60 | +1. Queries the on-chain `lastVerifiedBatch` and `lastCommittedBatch`. |
| 61 | +2. Collects all **consecutive** proven batches starting from `lastVerifiedBatch + 1`, checking that every required proof type is present for each batch. |
| 62 | +3. Sends them in a single `verifyBatches()` call to L1. |
| 63 | + |
| 64 | +For example, if batches 5, 6, 7 are fully proven but batch 8 is missing a proof, only batches 5–7 are sent. Batch 8 waits for its proof. |
| 65 | + |
| 66 | +### Fallback to single-batch sending |
| 67 | + |
| 68 | +On **any** multi-batch error (gas limit exceeded, calldata too large, invalid proof, etc.), the proof sender falls back to sending each batch individually. Since on-chain verification is sequential (`batchNumber == lastVerifiedBatch + 1`), the fallback stops at the first failing batch — remaining batches are retried on the next tick. |
| 69 | + |
| 70 | +During single-batch fallback, if the error indicates an invalid proof (e.g. "Invalid SP1 proof"), that proof is deleted from the store so a prover can re-prove it. |
| 71 | + |
| 72 | +## Configuration reference |
| 73 | + |
| 74 | +### Proof coordinator (sequencer side) |
| 75 | + |
| 76 | +| Flag | Env Variable | Default | Description | |
| 77 | +|------|-------------|---------|-------------| |
| 78 | +| `--proof-coordinator.addr` | `ETHREX_PROOF_COORDINATOR_LISTEN_ADDRESS` | `127.0.0.1` | Listen address | |
| 79 | +| `--proof-coordinator.port` | `ETHREX_PROOF_COORDINATOR_LISTEN_PORT` | `3900` | Listen port | |
| 80 | +| `--proof-coordinator.send-interval` | `ETHREX_PROOF_COORDINATOR_SEND_INTERVAL` | `5000` | How often (ms) the proof sender collects and sends proofs to L1 | |
| 81 | +| `--proof-coordinator.prover-timeout` | `ETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT` | `600000` | Timeout (ms) before reassigning a batch to another prover (default: 10 min) | |
| 82 | + |
| 83 | +### Prover client |
| 84 | + |
| 85 | +| Flag | Env Variable | Default | Description | |
| 86 | +|------|-------------|---------|-------------| |
| 87 | +| `--proof-coordinators` | `PROVER_CLIENT_PROOF_COORDINATOR_URL` | `tcp://127.0.0.1:3900` | Space-separated coordinator URLs | |
| 88 | +| `--backend` | `PROVER_CLIENT_BACKEND` | `exec` | Backend: `exec`, `sp1`, `risc0`, `zisk`, `openvm` | |
| 89 | +| `--proving-time` | `PROVER_CLIENT_PROVING_TIME` | `5000` | Wait time (ms) between requesting new work | |
34 | 90 |
|
35 | 91 | ## Testing locally |
36 | 92 |
|
@@ -74,47 +130,3 @@ make init-prover-exec |
74 | 130 | ``` |
75 | 131 |
|
76 | 132 | Each prover will be assigned a different batch. When both finish, the proof sender will collect the consecutive proven batches and submit them in a single `verifyBatches` transaction on L1. |
77 | | - |
78 | | -## Configuration reference |
79 | | - |
80 | | -### Proof coordinator (L2 side) |
81 | | - |
82 | | -| Flag | Env Variable | Default | Description | |
83 | | -|------|-------------|---------|-------------| |
84 | | -| `--proof-coordinator.addr` | `ETHREX_PROOF_COORDINATOR_LISTEN_ADDRESS` | `127.0.0.1` | Listen address | |
85 | | -| `--proof-coordinator.port` | `ETHREX_PROOF_COORDINATOR_LISTEN_PORT` | `3900` | Listen port | |
86 | | -| `--proof-coordinator.send-interval` | `ETHREX_PROOF_COORDINATOR_SEND_INTERVAL` | `5000` | How often (ms) the proof sender batches and sends proofs to L1 | |
87 | | -| `--proof-coordinator.prover-timeout` | `ETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT` | `600000` | Timeout (ms) before reassigning a batch to another prover (default: 10 min) | |
88 | | - |
89 | | -### Prover client |
90 | | - |
91 | | -| Flag | Env Variable | Default | Description | |
92 | | -|------|-------------|---------|-------------| |
93 | | -| `--proof-coordinators` | `PROVER_CLIENT_PROOF_COORDINATOR_URL` | `tcp://127.0.0.1:3900` | Space-separated coordinator URLs | |
94 | | -| `--backend` | `PROVER_CLIENT_BACKEND` | `exec` | Backend: `exec`, `sp1`, `risc0`, `zisk`, `openvm` | |
95 | | -| `--proving-time` | `PROVER_CLIENT_PROVING_TIME` | `5000` | Wait time (ms) between requesting new work | |
96 | | - |
97 | | -## How it works |
98 | | - |
99 | | -### Batch assignment |
100 | | - |
101 | | -When a prover sends a `BatchRequest`, it includes its `prover_type`. The coordinator: |
102 | | - |
103 | | -1. Scans batches starting from the oldest unverified one. |
104 | | -2. Skips batches that already have a proof for this `prover_type`. |
105 | | -3. Skips batches currently assigned to another prover of the same type (unless the assignment timed out). |
106 | | -4. Assigns the first available batch and records `(batch_number, prover_type) → Instant::now()`. |
107 | | - |
108 | | -### Prover timeout |
109 | | - |
110 | | -If a prover doesn't submit a proof within `prover-timeout` (default 10 minutes), its assignment expires and the batch becomes available for reassignment to another prover. |
111 | | - |
112 | | -### Multi-batch verification |
113 | | - |
114 | | -The proof sender periodically (every `send-interval` ms): |
115 | | - |
116 | | -1. Collects all **consecutive** proven batches starting from `last_verified_batch + 1`. |
117 | | -2. Sends them in a single `verifyBatches()` call to L1. |
118 | | -3. Falls back to per-batch verification if any batch has an invalid proof, to isolate the failure. |
119 | | - |
120 | | -For example, if batches 1, 2, 3 are proven but 4 is not, only batches 1-3 are sent. Batch 4 waits for its proof. |
|
0 commit comments