QuePaxa is a novel crash fault tolerant and asynchronous consensus algorithm.
The main innovations of QuePaxa, compared to the existing consensus algorithms such as Multi-Paxos, Raft, Rabia and EPaxos are threefold:
-
QuePaxaemploys a new asynchronous consensus core to tolerate adverse network conditions, while having a one-round-trip fast path under normal-case network conditions. -
QuePaxaemployshedging-delayinstead of traditionaltimeouts.Hedging delayscan be arbitrarily configured inQuePaxa, without affecting the liveness, whereas forRaftandMulti-Paxos, a conservatively high timeout should be used to ensure liveness. -
QuePaxadynamically tunes the protocol at runtime to maximize the performance.
Our SOSP paper, QuePaxa: Escaping the tyranny of timeouts in consensus describes QuePaxa's design and evaluation in detail.
state-machine replication (SMR), consensus, asynchrony, randomization, tuning, and hedging
-
client/,common/,configuration/,proto/,replica/provide theQuePaxareplicaandsubmitterimplementations in Go-lang. -
integration-test/contains the set of integration tests that test the correctness ofQuePaxain a single machine deployment with 5 replicas and 5 clients under different configuration parameters. -
experiments/: contains the automated scripts for artifact evaluation and the compiled binaries ofEpaxos,Multi-paxos,Raft,RabiaandQuePaxa.
-
how to build, install and run
QuePaxain a single VM -
how to read
QuePaxacodebase
In this section, we explain how to build, install and run QuePaxa in a single VM
- a VM with at least 4 cores, 8GB memory and 10GB HDD
- Ubuntu 20.04 LTS with
sudoaccess python3installed withnumpyandmatplotlib
-
install
python pip3,matplotlibandgo 1.19.5sudo apt update sudo apt install python3-pip; pip3 install matplotlib rm -rf /usr/local/go wget https://go.dev/dl/go1.19.5.linux-amd64.tar.gz sudo tar -C /usr/local -xzf go1.19.5.linux-amd64.tar.gz export PATH=$PATH:/usr/local/go/binCheck if the installation is successful by issueing the command:
go versionwhich should outputgo1.19.5 linux/amd64.This repository uses Protocol Buffers. Install the protoc compiler by following the Protocol Buffers.
-
clone QuePaxa from github and build the project
git clone https://github.com/dedis/quepaxa cd quepaxa /bin/bash build.shbuild.shwill produce an errorprotoc: command not found, if you have not installed theprotoccompiler correctly.NOTE: The outputs might show
raxosandQuePaxainterchangeably. This is because,QuePaxacode was initially namedRaxosand later renamed toQuePaxa. -
run 5 replicas and 5 submitters in the same VM with
50k cmd/secaggregate arrival rate/bin/bash integration-test/safety-test.sh 200000 0 0 1 5000 50 10000 100 0 0If the test was successful, the final outputs should look like the following.
In the
logs/200000/0/0/1/5000/50/10000/100/0/0/folder, you can see the output of 5 replicas (1-5.log) and 5 submitters (21-25.log). Submitter logs contain the median latency, throughput and 99 percentile latency statistics.A sample client output at
logs/200000/0/0/1/5000/50/10000/100/0/0/21.loginitialized client 21 with process id: 31567 starting request client Finish sending requests Calculating stats for thread 0 Total time := 60 seconds Throughput := 10026 requests per second Median Latency := 5201 micro seconds per request 99 pecentile latency := 19635 micro seconds per request Error Rate := 0 finishing request clientA sample replica output at
logs/200000/0/0/1/5000/50/10000/100/0/0/1.logstarted QuePaxa Server Average number of steps per slot: 1.000000, total slots 10183, steps accumilated 10183The log consistency (SMR correctness) can be found in the file
logs/200000/0/0/1/5000/50/10000/100/0/0/consensus.log60150596 entries match 0 entries miss match TEST PASS -
Run integration tests that check the SMR correctness of
QuePaxaunder different configuration parameterspython3 integration-test/python/integration-automation.pyThe integration tests consists of 9 tests, each of which exercise
QuePaxaunder different leader modes, timeouts and network conditions, using 5 replicas and 5 submitters setup in a single VM.
In thelogs/folder you will find the log files corresponding to each test. Each subdirectory oflogs/is indexed aslogs/leaderTimeout/serverMode/leaderMode/pipeline/batchTime/batchSize/arrivalRate/closeLoopWindow/requestPropagationTime/asynchronousSimulationTime/Please cross-check with theintegration-test/python/integration-automation.pyfile to see what parameters are used in each integration subtest, that will help you to locate the output sub-folder in thelogs/
In the following section, we outline the QuePaxa implementation architecture and the package level comments
that will allow you to understand the QuePaxa code base.
Above figure depicts the QuePaxa architecture. QuePaxa contains two separate binaries;
the client (submitter in the paper) and the replica.
QuePaxa client is the program that generates a stream of client requests to be consumed by the replica nodes.
QuePaxa client supports the following configuration parameters.
- arrivalRate int: Poisson arrival rate in requests per second (default 10000)
- batchSize int : client batch size (default 50)
- batchTime int: client batch time in micro seconds (default 50)
- config string: quepaxa configuration file which contains the ip:port of each client and replica
- debugLevel int: debug level, debug messages with equal or higher debugLevel will be printed on console
- debugOn: turn on/off debug, turn off when benchmarking
- keyLen int: length of key in client requests (default 8)
- logFilePath string: log file path (default "logs/")
- name int: name of the client (default 21)
- operationType int: Type of operation for a status request: 1 bootstrap server, 2: print log, and 4. average number of steps per slot (default 1)
- requestType string: request type: [status , request] (default "status")
- testDuration int: test duration in seconds (default 60)
- valLen int: length of value in client requests (default 8)
- window int: number of outstanding client batches sent by the client, before receiving the response (default 1000)
QuePaxa client connects with all the QuePaxa replicas using TCP connections.
QuePaxa replica implements the QuePaxa consensus algorithm and the state machine replication logic.
QuePaxa replica supports the following configurations.
- batchSize int: replica batch size (default 50)
- batchTime int: replica batch time in micro seconds (default 5000)
- benchmark int: Benchmark: 0 for KV store and 1 for Redis
- config string: QuePaxa configuration file (default "
configuration/local/configuration.yml) - debugLevel int: debug level (default 1010)
- debugOn: true / false
- epochSize int: epoch size for MAB (default 100)
- isAsync: true / false to simulate consensus level asynchrony
- keyLen int: length of key (default 8)
- leaderMode int: mode of leader change: 0 for fixed leader order, 1 for round robin, static partition, 2 for M.A.B based on commit times, 3 for asynchronous, 4 for last committed proposer
- leaderTimeout int: leader timeout in micro seconds (default 5000000)
- logFilePath string: log file path (default "logs/")
- name int: name of the replica (default 1)
- pipelineLength int: pipeline length maximum number of outstanding proposals (default 1)
- requestPropagationTime int: additional wait time in 'milli seconds' for client batches, such that there is enough time for client driven request propagation
- serverMode int: 0 for non-lan-optimized, 1 for lan optimized
- timeEpochSize int: duration of a time epoch for the attacker in milli seconds (default 500)
- valLen int: length of value (default 8)
- asyncTimeOut int: artificial async timeout in milli seconds (default 500)
QuePaxa replica consists of three layers: Proxy, Proposer and Recorder.
-
Proxy: Proxy maintains the state machine replication logic. Proxy receives client commands from the clients using TCP connections and forms a replica batch, to be consumed by the proposers.
Proxy then sends the replica batches to the proposer.
Upon reaching consensus, Proposer sends back the agreed upon value to the Proxy
and the Proxy updates the state machine. Then the proxy sends back the response to the client. -
Proposer: Proposer is the implementation of the proposer segment ofQuePaxa, as described in our SOSP paper: QuePaxa: Escaping the tyranny of timeouts in consensus. Proposer communicates with all the recorders of all replicas usinggRPC. -
Recorder: Recorder is the implementation of theInterval summary registeras described in the our paper. Recorder responds to proposer requests.
-
client/: contains the client implementation. The request sending logic is inrequest.goand the statistic calculation logic is instat.go -
common/: contains go structs that are both common to replica and client -
configuration/: contains the code to extract configuration data from a.yamlfile into acfgobject. -
experiments/: contains the AWS deployment scripts for the artifact evaluation. More on this is available in the artifact evaluation document. -
integration-test/: contains the integration tests forQuePaxa. -
proto/: contains the proto definitions of client messages -
replica/: contains theQuePaxareplica logic.proposer.goandrecorder.gocontain the proposer and theISRlogics, respectively. -
build.sh/: contains the build script
