In order to setup op-conductor, you need to configure the following env vars for both op-conductor and op-node service:
OP_NODE_CONDUCTOR_ENABLED=true
OP_NODE_CONDUCTOR_RPC=<conductor-rpc-endpoint> # for example http://conductor:8545# prefix for the server id, used to identify the server in the raft cluster
RAFT_SERVER_ID_PREFIX=<prefix-for-server-id> # for example, sequencer-1, sequencer-2, etc
OP_CONDUCTOR_RAFT_STORAGE_DIR=<raft-storage-dir>
OP_CONDUCTOR_RPC_ADDR=<rpc-address> # for example, 0.0.0.0
OP_CONDUCTOR_RPC_PORT=<rpc-port> # for example, 8545
OP_CONDUCTOR_METRICS_ENABLED=true/false
OP_CONDUCTOR_METRICS_ADDR=<metrics-address> # for example 0.0.0.0
OP_CONDUCTOR_METRICS_PORT=<metrics-port> # for example 7300
OP_CONDUCTOR_CONSENSUS_PORT=<consensus-port> # for example 50050
OP_CONDUCTOR_PAUSED=true # set to true to start conductor in paused state
OP_CONDUCTOR_NODE_RPC=<node-rpc-endpoint> # for example, http://op-node:8545
OP_CONDUCTOR_EXECUTION_RPC=<execution-rpc-endpoint> # for example, http://op-geth:8545
OP_CONDUCTOR_NETWORK=<network-name> # for example, base-mainnet, op-mainnet, etc, should be same as OP_NODE_NETWORK
OP_CONDUCTOR_HEALTHCHECK_INTERVAL=<healthcheck-interval> # in seconds
OP_CONDUCTOR_HEALTHCHECK_UNSAFE_INTERVAL=<unsafe-interval> # Interval allowed between unsafe head and now measured in seconds
OP_CONDUCTOR_HEALTHCHECK_MIN_PEER_COUNT=<min-peer-count> # minimum number of peers required to be considered healthy
OP_CONDUCTOR_RAFT_BOOTSTRAP=true/false # set to true if you want to bootstrap the raft clusterIn normal situations, you probably have a running sequencer already and you want to turn it into a HA cluster. What you need to do in this situation is to:
- start a completely new sequencer with above mentioned configurations and
OP_CONDUCTOR_RAFT_BOOTSTRAP=trueset on op-conductorOP_CONDUCTOR_PAUSED=trueset on op-conductorOP_NODE_SEQUENCER_ENABLED=trueset on op-node
- wait for the new sequencer to start and get synced up with the rest of the nodes
- once the new sequencer is synced up, manually or use automation to stop sequencing on the old sequencer and start sequencing on the new sequencer
- resume the conductor on the new sequencer by calling
conductor_resumejson rpc method on op-conductor - set
OP_CONDUCTOR_RAFT_BOOTSTRAP=falseon the sequencer so that it doesn't attempt to bootstrap the cluster during redeploy
Now you have a single HA sequencer which treats itself as the cluster leader! Next steps would be to add more sequencers to the cluster depending on your needs. For example, we want a 3-node cluster, you can follow the same process to add 2 more sequencers.
- start a new sequencer with
OP_CONDUCTOR_RAFT_BOOTSTRAP=falseset on op-conductorOP_CONDUCTOR_PAUSED=trueset on op-conductor
- wait for the new sequencer to start and get synced up with the rest of the nodes
- once the new sequencer is synced up, manually or use automation to add it to the cluster by calling
conductor_addServerAsVoterjson rpc method on the leader sequencer - call
conductor_clusterMembershipjson rpc method on the leader sequencer to get the updated cluster membership - resume the conductor on the new sequencer by calling
conductor_resumejson rpc method on op-conductor
Once finished, you should have a 3-node HA sequencer cluster!
For every redeploy, depending on your underlying infrastructure, you need to make sure to:
OP_CONDUCTOR_PAUSED=trueset on op-conductor so that conductor doesn't attempt to control sequencer while it's still syncing / redeploying- make sure sequencer is caught up with the rest of the nodes (this step isn't strictly necessary as conductor could handle this, but from a HA perspective, it does not make sense to have a sequencer that is lagging behind to join the cluster to potentially become the leader)
- resume conductor after it's caught up with the rest of the nodes so that conductor can start managing the sequencer
Whenever there are a disaster situation that you see no route to have 2 healthy conductor in the cluster communicating with each other, you need to manually intervene to resume sequencing. The steps are as follows:
- call
conductor_pausejson rpc method on the all conductors so that they don't attempt to start / stop sequencer - choose a sequencer that can be used to resume sequencing
- call
conductor_overrideLeaderjson rpc method on the conductor to force it to treat itself as the leader - If no conductor is functioning, call
admin_overrideLeaderjson rpc method on the op-node to force it to treat itself as the leader - manually start sequencing on the chosen sequencer
- Go back to bootstrap step to re-bootstrap the cluster.