Skip to content

feat: Multi-Worker Support — PR 4: EpochManager Multi-Worker Loop #557

@grantkee

Description

@grantkee

Problem

The node currently hardcodes a single worker per validator (worker_id = 0). The EpochManager, network layer, and several initialization paths assume exactly one worker. This blocks the ability to run independent fee markets, specialized transaction pools, or any form of worker-level parallelism.

Goal

Refactor the node to support N independent workers per validator. Each worker operates as a standalone unit with its own:

  • libp2p swarm (dedicated gossip topics, listen address, network key)
  • RPC server (unique port)
  • Transaction pool
  • Batch builder + batch validator
  • LocalNetwork instance for primary communication

Workers share only the Primary (consensus) and the execution engine (block production). The num_workers count is a consensus-level parameter — all validators must agree on it.

Why

The immediate motivation is multiple fee markets. Once multi-worker is in place, a follow-up (Phase 2) spawns 2 workers by default:

  • Worker 0 (General): accepts all transactions, standard EIP-1559 fee market
  • Worker 1 (Whitelisted Transfers): accepts only whitelisted ERC-20 transfer/transferFrom calls, operates with a reduced base fee

This architecture also enables future process separation — workers can be extracted into standalone processes communicating with the primary over RPC.

Design Constraints

  1. Workers are fully independent — no cross-worker shared state. Each worker has its own network identity, pool, and gossip topics.
  2. Per-worker gossip topicstn-worker-{id} and tn-txn-{id} replace the current global tn-worker and tn-txn topics. This provides network-level isolation.
  3. Per-worker LocalNetwork — each worker gets its own LocalNetwork instance for primary communication. The primary registers as the handler on every worker's LocalNetwork. This is the seam for future process separation.
  4. num_workers is a consensus parameter — changing it requires a coordinated upgrade across all validators. Defaults to 1 for backward compatibility.
  5. Execution engine is shared — batches from all workers are processed sequentially by the same engine. Worker ID is already encoded in the block difficulty field.
  6. Faucet on worker 0 only — the testnet faucet attaches to the general-purpose worker.

Current State

Much of the infrastructure already supports N workers but is only called with worker_id = 0:

  • ExecutionNodeInner.workers: Vec<WorkerComponents> — vec exists, only 1 element
  • GasAccumulator — supports N workers internally, but initialized with new(1)
  • BatchValidator — already stores worker_id and rejects mismatched batches
  • adjust_base_fees() — loops over num_workers() but is a no-op
  • Block difficulty field — already encodes batch_index << 16 | worker_id

Hardcoded locations that block multi-worker:

Location Current Fix
manager.rs spawn_worker_node_components() let worker_id = 0; Loop over 0..num_workers
manager.rs GasAccumulator::new(1) Hardcoded 1 worker Use num_workers
manager.rs catchup_accumulator() gas_accumulator.base_fee(0) Restore per-worker base fees
manager.rs EpochManager struct Singular worker_network_handle Vec<WorkerNetworkHandle>
manager.rs create_consensus() Returns (PrimaryNode, WorkerNode) Returns (PrimaryNode, Vec<WorkerNode>)
config/genesis.rs NodeP2pInfo Single worker: NetworkInfo workers: Vec<NetworkInfo>
config/node.rs Parameters No num_workers field Add num_workers: u16 (default 1)
config/network.rs Global topics tn-worker, tn-txn Per-worker tn-worker-{id}, tn-txn-{id}
config/consensus.rs Single LocalNetwork Vec<LocalNetwork>

This PR: EpochManager Multi-Worker Loop (core change)

This is the core refactor. The EpochManager stops hardcoding worker_id = 0 and instead loops over all workers when spawning components, creating consensus nodes, and managing epochs. With num_workers = 1 the loop runs once — identical behavior to today.

Scope

crates/node/src/manager.rsGasAccumulator initialization:

  • Change GasAccumulator::new(1) to GasAccumulator::new(num_workers as usize)

crates/node/src/manager.rsspawn_worker_node_components():

  • Loop for worker_id in 0..num_workers:
    • Get the worker's network handle from self.worker_network_handles[worker_id]
    • Call engine.initialize_worker_components(worker_id, ...) (initial epoch only)
    • Call engine.new_batch_validator(&worker_id, ...)
    • Call self.spawn_worker_network_for_epoch(...) for this worker
    • Create WorkerNode::new(worker_id, ...)
  • Return Vec<WorkerNode> instead of a single WorkerNode

crates/node/src/manager.rscreate_consensus():

  • Change return type from (PrimaryNode, WorkerNode) to (PrimaryNode, Vec<WorkerNode>)
  • Worker creation is done inside the loop above

crates/node/src/manager.rsrun_epoch():

  • Receive Vec<WorkerNode> from create_consensus()
  • Loop over each WorkerNode:
    • Call worker_node.new_worker().await? to get Worker instance
    • Call worker.spawn_batch_builder(...)
    • Call engine.start_batch_builder(worker.id(), worker.batches_tx(), ...)
  • Handle orphan batches per-worker

crates/node/src/manager.rscatchup_accumulator():

  • The existing code already extracts worker_id from the difficulty field per-block and calls gas_accumulator.inc_block(worker_id, ...) — this already works for N workers
  • Fix: The hardcoded gas_accumulator.base_fee(0) for initial base fee restoration needs to scan backwards through blocks, extract worker_id from difficulty, and collect the latest base_fee for each worker
  • Set gas_accumulator.base_fee(worker_id).set_base_fee(...) for each worker

Backward Compatibility

  • With num_workers = 1, all loops run once and produce the same single-worker result as today
  • GasAccumulator::new(1) behavior is preserved when num_workers = 1
  • catchup_accumulator with a single worker only finds worker_id = 0 blocks — same as current behavior

blocked by #555 (needs Vec<WorkerNetworkHandle>) and #556 (needs Vec<LocalNetwork>)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions