Skip to content

feat: Multi-Worker Support — PR 2: Per-Worker Network Swarms #555

@grantkee

Description

@grantkee

Problem

The node currently hardcodes a single worker per validator (worker_id = 0). The EpochManager, network layer, and several initialization paths assume exactly one worker. This blocks the ability to run independent fee markets, specialized transaction pools, or any form of worker-level parallelism.

Goal

Refactor the node to support N independent workers per validator. Each worker operates as a standalone unit with its own:

  • libp2p swarm (dedicated gossip topics, listen address, network key)
  • RPC server (unique port)
  • Transaction pool
  • Batch builder + batch validator
  • LocalNetwork instance for primary communication

Workers share only the Primary (consensus) and the execution engine (block production). The num_workers count is a consensus-level parameter — all validators must agree on it.

Why

The immediate motivation is multiple fee markets. Once multi-worker is in place, a follow-up (Phase 2) spawns 2 workers by default:

  • Worker 0 (General): accepts all transactions, standard EIP-1559 fee market
  • Worker 1 (Whitelisted Transfers): accepts only whitelisted ERC-20 transfer/transferFrom calls, operates with a reduced base fee

This architecture also enables future process separation — workers can be extracted into standalone processes communicating with the primary over RPC.

Design Constraints

  1. Workers are fully independent — no cross-worker shared state. Each worker has its own network identity, pool, and gossip topics.
  2. Per-worker gossip topicstn-worker-{id} and tn-txn-{id} replace the current global tn-worker and tn-txn topics. This provides network-level isolation.
  3. Per-worker LocalNetwork — each worker gets its own LocalNetwork instance for primary communication. The primary registers as the handler on every worker's LocalNetwork. This is the seam for future process separation.
  4. num_workers is a consensus parameter — changing it requires a coordinated upgrade across all validators. Defaults to 1 for backward compatibility.
  5. Execution engine is shared — batches from all workers are processed sequentially by the same engine. Worker ID is already encoded in the block difficulty field.
  6. Faucet on worker 0 only — the testnet faucet attaches to the general-purpose worker.

Current State

Much of the infrastructure already supports N workers but is only called with worker_id = 0:

  • ExecutionNodeInner.workers: Vec<WorkerComponents> — vec exists, only 1 element
  • GasAccumulator — supports N workers internally, but initialized with new(1)
  • BatchValidator — already stores worker_id and rejects mismatched batches
  • adjust_base_fees() — loops over num_workers() but is a no-op
  • Block difficulty field — already encodes batch_index << 16 | worker_id

Hardcoded locations that block multi-worker:

Location Current Fix
manager.rs spawn_worker_node_components() let worker_id = 0; Loop over 0..num_workers
manager.rs GasAccumulator::new(1) Hardcoded 1 worker Use num_workers
manager.rs catchup_accumulator() gas_accumulator.base_fee(0) Restore per-worker base fees
manager.rs EpochManager struct Singular worker_network_handle Vec<WorkerNetworkHandle>
manager.rs create_consensus() Returns (PrimaryNode, WorkerNode) Returns (PrimaryNode, Vec<WorkerNode>)
config/genesis.rs NodeP2pInfo Single worker: NetworkInfo workers: Vec<NetworkInfo>
config/node.rs Parameters No num_workers field Add num_workers: u16 (default 1)
config/network.rs Global topics tn-worker, tn-txn Per-worker tn-worker-{id}, tn-txn-{id}
config/consensus.rs Single LocalNetwork Vec<LocalNetwork>

This PR: Per-Worker Network Swarms

Each worker gets its own libp2p swarm with dedicated gossip topics, listen address, and network key. With num_workers = 1, behavior is identical to today.

Scope

crates/node/src/manager.rsEpochManager struct:

  • Change worker_network_handle: Option<WorkerNetworkHandle> to worker_network_handles: Vec<WorkerNetworkHandle> (indexed by WorkerId)
  • Change worker_event_stream: QueChannel<...> to worker_event_streams: Vec<QueChannel<...>> (one per worker)

crates/node/src/manager.rsspawn_node_networks():

  • Loop for worker_id in 0..num_workers:
    • Create a QueChannel for this worker's events
    • Call ConsensusNetwork::new_for_worker() with:
      • The worker-specific NetworkInfo from NodeInfo.p2p_info.workers[worker_id]
      • Worker-specific topics: worker_batch_topic(worker_id), worker_txn_topic(worker_id)
    • Spawn the swarm as a critical task: "Worker Network {worker_id}"
    • Push the handle and event stream into the respective vecs

crates/network-libp2p/ConsensusNetwork::new_for_worker():

  • Accept worker_id: WorkerId parameter
  • Subscribe to per-worker topics using worker_batch_topic(worker_id) and worker_txn_topic(worker_id)
  • Each swarm listens on a distinct address (from NodeInfo.p2p_info.workers[worker_id])
  • Each swarm uses its own network keypair

All callsites that access self.worker_network_handle:

  • Update to index into self.worker_network_handles[worker_id]
  • Primarily spawn_worker_node_components() and spawn_worker_network_for_epoch()

blocked by #554

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions