-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Problem
The node currently hardcodes a single worker per validator (worker_id = 0). The EpochManager, network layer, and several initialization paths assume exactly one worker. This blocks the ability to run independent fee markets, specialized transaction pools, or any form of worker-level parallelism.
Goal
Refactor the node to support N independent workers per validator. Each worker operates as a standalone unit with its own:
- libp2p swarm (dedicated gossip topics, listen address, network key)
- RPC server (unique port)
- Transaction pool
- Batch builder + batch validator
LocalNetworkinstance for primary communication
Workers share only the Primary (consensus) and the execution engine (block production). The num_workers count is a consensus-level parameter — all validators must agree on it.
Why
The immediate motivation is multiple fee markets. Once multi-worker is in place, a follow-up (Phase 2) spawns 2 workers by default:
- Worker 0 (General): accepts all transactions, standard EIP-1559 fee market
- Worker 1 (Whitelisted Transfers): accepts only whitelisted ERC-20
transfer/transferFromcalls, operates with a reduced base fee
This architecture also enables future process separation — workers can be extracted into standalone processes communicating with the primary over RPC.
Design Constraints
- Workers are fully independent — no cross-worker shared state. Each worker has its own network identity, pool, and gossip topics.
- Per-worker gossip topics —
tn-worker-{id}andtn-txn-{id}replace the current globaltn-workerandtn-txntopics. This provides network-level isolation. - Per-worker
LocalNetwork— each worker gets its ownLocalNetworkinstance for primary communication. The primary registers as the handler on every worker'sLocalNetwork. This is the seam for future process separation. num_workersis a consensus parameter — changing it requires a coordinated upgrade across all validators. Defaults to1for backward compatibility.- Execution engine is shared — batches from all workers are processed sequentially by the same engine. Worker ID is already encoded in the block
difficultyfield. - Faucet on worker 0 only — the testnet faucet attaches to the general-purpose worker.
Current State
Much of the infrastructure already supports N workers but is only called with worker_id = 0:
ExecutionNodeInner.workers: Vec<WorkerComponents>— vec exists, only 1 elementGasAccumulator— supports N workers internally, but initialized withnew(1)BatchValidator— already storesworker_idand rejects mismatched batchesadjust_base_fees()— loops overnum_workers()but is a no-op- Block
difficultyfield — already encodesbatch_index << 16 | worker_id
Hardcoded locations that block multi-worker:
| Location | Current | Fix |
|---|---|---|
manager.rs spawn_worker_node_components() |
let worker_id = 0; |
Loop over 0..num_workers |
manager.rs GasAccumulator::new(1) |
Hardcoded 1 worker | Use num_workers |
manager.rs catchup_accumulator() |
gas_accumulator.base_fee(0) |
Restore per-worker base fees |
manager.rs EpochManager struct |
Singular worker_network_handle |
Vec<WorkerNetworkHandle> |
manager.rs create_consensus() |
Returns (PrimaryNode, WorkerNode) |
Returns (PrimaryNode, Vec<WorkerNode>) |
config/genesis.rs NodeP2pInfo |
Single worker: NetworkInfo |
workers: Vec<NetworkInfo> |
config/node.rs Parameters |
No num_workers field |
Add num_workers: u16 (default 1) |
config/network.rs |
Global topics tn-worker, tn-txn |
Per-worker tn-worker-{id}, tn-txn-{id} |
config/consensus.rs |
Single LocalNetwork |
Vec<LocalNetwork> |
This PR: EpochManager Multi-Worker Loop (core change)
This is the core refactor. The EpochManager stops hardcoding worker_id = 0 and instead loops over all workers when spawning components, creating consensus nodes, and managing epochs. With num_workers = 1 the loop runs once — identical behavior to today.
Scope
crates/node/src/manager.rs — GasAccumulator initialization:
- Change
GasAccumulator::new(1)toGasAccumulator::new(num_workers as usize)
crates/node/src/manager.rs — spawn_worker_node_components():
- Loop
for worker_id in 0..num_workers:- Get the worker's network handle from
self.worker_network_handles[worker_id] - Call
engine.initialize_worker_components(worker_id, ...)(initial epoch only) - Call
engine.new_batch_validator(&worker_id, ...) - Call
self.spawn_worker_network_for_epoch(...)for this worker - Create
WorkerNode::new(worker_id, ...)
- Get the worker's network handle from
- Return
Vec<WorkerNode>instead of a singleWorkerNode
crates/node/src/manager.rs — create_consensus():
- Change return type from
(PrimaryNode, WorkerNode)to(PrimaryNode, Vec<WorkerNode>) - Worker creation is done inside the loop above
crates/node/src/manager.rs — run_epoch():
- Receive
Vec<WorkerNode>fromcreate_consensus() - Loop over each
WorkerNode:- Call
worker_node.new_worker().await?to getWorkerinstance - Call
worker.spawn_batch_builder(...) - Call
engine.start_batch_builder(worker.id(), worker.batches_tx(), ...)
- Call
- Handle orphan batches per-worker
crates/node/src/manager.rs — catchup_accumulator():
- The existing code already extracts
worker_idfrom the difficulty field per-block and callsgas_accumulator.inc_block(worker_id, ...)— this already works for N workers - Fix: The hardcoded
gas_accumulator.base_fee(0)for initial base fee restoration needs to scan backwards through blocks, extractworker_idfrom difficulty, and collect the latestbase_feefor each worker - Set
gas_accumulator.base_fee(worker_id).set_base_fee(...)for each worker
Backward Compatibility
- With
num_workers = 1, all loops run once and produce the same single-worker result as today GasAccumulator::new(1)behavior is preserved whennum_workers = 1catchup_accumulatorwith a single worker only findsworker_id = 0blocks — same as current behavior
blocked by #555 (needs Vec<WorkerNetworkHandle>) and #556 (needs Vec<LocalNetwork>)