Skip to content

feat: Multi-Worker Support — PR 5: Execution Engine Initialization #558

@grantkee

Description

@grantkee

Problem

The node currently hardcodes a single worker per validator (worker_id = 0). The EpochManager, network layer, and several initialization paths assume exactly one worker. This blocks the ability to run independent fee markets, specialized transaction pools, or any form of worker-level parallelism.

Goal

Refactor the node to support N independent workers per validator. Each worker operates as a standalone unit with its own:

  • libp2p swarm (dedicated gossip topics, listen address, network key)
  • RPC server (unique port)
  • Transaction pool
  • Batch builder + batch validator
  • LocalNetwork instance for primary communication

Workers share only the Primary (consensus) and the execution engine (block production). The num_workers count is a consensus-level parameter — all validators must agree on it.

Why

The immediate motivation is multiple fee markets. Once multi-worker is in place, a follow-up (Phase 2) spawns 2 workers by default:

  • Worker 0 (General): accepts all transactions, standard EIP-1559 fee market
  • Worker 1 (Whitelisted Transfers): accepts only whitelisted ERC-20 transfer/transferFrom calls, operates with a reduced base fee

This architecture also enables future process separation — workers can be extracted into standalone processes communicating with the primary over RPC.

Design Constraints

  1. Workers are fully independent — no cross-worker shared state. Each worker has its own network identity, pool, and gossip topics.
  2. Per-worker gossip topicstn-worker-{id} and tn-txn-{id} replace the current global tn-worker and tn-txn topics. This provides network-level isolation.
  3. Per-worker LocalNetwork — each worker gets its own LocalNetwork instance for primary communication. The primary registers as the handler on every worker's LocalNetwork. This is the seam for future process separation.
  4. num_workers is a consensus parameter — changing it requires a coordinated upgrade across all validators. Defaults to 1 for backward compatibility.
  5. Execution engine is shared — batches from all workers are processed sequentially by the same engine. Worker ID is already encoded in the block difficulty field.
  6. Faucet on worker 0 only — the testnet faucet attaches to the general-purpose worker.

Current State

Much of the infrastructure already supports N workers but is only called with worker_id = 0:

  • ExecutionNodeInner.workers: Vec<WorkerComponents> — vec exists, only 1 element
  • GasAccumulator — supports N workers internally, but initialized with new(1)
  • BatchValidator — already stores worker_id and rejects mismatched batches
  • adjust_base_fees() — loops over num_workers() but is a no-op
  • Block difficulty field — already encodes batch_index << 16 | worker_id

Hardcoded locations that block multi-worker:

Location Current Fix
manager.rs spawn_worker_node_components() let worker_id = 0; Loop over 0..num_workers
manager.rs GasAccumulator::new(1) Hardcoded 1 worker Use num_workers
manager.rs catchup_accumulator() gas_accumulator.base_fee(0) Restore per-worker base fees
manager.rs EpochManager struct Singular worker_network_handle Vec<WorkerNetworkHandle>
manager.rs create_consensus() Returns (PrimaryNode, WorkerNode) Returns (PrimaryNode, Vec<WorkerNode>)
config/genesis.rs NodeP2pInfo Single worker: NetworkInfo workers: Vec<NetworkInfo>
config/node.rs Parameters No num_workers field Add num_workers: u16 (default 1)
config/network.rs Global topics tn-worker, tn-txn Per-worker tn-worker-{id}, tn-txn-{id}
config/consensus.rs Single LocalNetwork Vec<LocalNetwork>

This PR: Execution Engine Initialization

Initialize WorkerComponents (TX pool, RPC server, worker network) for each worker. The ExecutionNodeInner already has a workers: Vec<WorkerComponents> — this PR populates it with N entries instead of 1. With num_workers = 1, behavior is identical to today.

Scope

crates/node/src/engine/inner.rsinitialize_worker_components():

  • Called in a loop for each worker_id in order (0, 1, ..., N-1)
  • Each call creates:
    • Its own WorkerTxPool via reth_env.init_txn_pool()
    • Its own WorkerNetwork with the worker-specific WorkerNetworkHandle
    • Its own RPC server (each on a different port, allocated by the reth RPC infrastructure)
  • Faucet: self.opt_faucet_args.take() is called once — only worker 0 gets the faucet extension. For worker_id > 0, the faucet branch is skipped (the Option is already None after the first take)

crates/node/src/engine/inner.rsstart_batch_builder():

  • Already parameterized by worker_id — no change needed
  • Will be called once per worker from run_epoch()

crates/node/src/engine/inner.rsnew_batch_validator():

  • Already parameterized by worker_id — no change needed
  • Returns a BatchValidator bound to that worker's TX pool

crates/node/src/engine/inner.rsrespawn_worker_network_tasks():

  • Currently takes a single WorkerNetworkHandle but loops over self.workers
  • Change signature to accept &[WorkerNetworkHandle] and match handles to workers by index
  • Each worker's network gets the correct handle on epoch rollover

RPC port allocation:

  • Each worker's RPC server needs a unique port
  • The existing reth_env.get_rpc_server() allocates ports — ensure it assigns distinct ports for each call

Backward Compatibility

  • With num_workers = 1, the loop runs once, producing one WorkerComponents entry — same as today
  • Faucet attaches to worker 0 only, which is the existing behavior
  • respawn_worker_network_tasks(&[single_handle]) with one element behaves identically to the current single-handle version

blocked by #557 (EpochManager Multi-Worker Loop) — the loop that calls initialize_worker_components lives in spawn_worker_node_components()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions