^_^ :o)
o o
:-O ಠ_ಠ
o [B]-> o
:-) | >:(
o V o
x_x uwu
o o
:-) :-3
o o
The Algorand Byzantine Agreement protocol enables all nodes to consistently update the state of the system.
The agreement.Service establishes a consensus on the ordering of
Blocks. This ordering is defined by a Round number, which indexes
into the ordered log of Blocks.
Clients instantiate an agreement.Service by providing it several
parameters:
Ledgerrepresents a data store which supports the reading and writing of data stored withinBlocks.BlockFactoryproducesBlocks for a given round.BlockValidatorvalidatesBlocks for a given round.KeyManagerholds the participation keys necessary to participate in the protocol.Networkprovides an abstraction over the underlying network.timers.Clockprovides timekeeping services for timeouts.db.Accessorprovides persistent storage for internal state.
Blocks for which consensus is completed are written using
Ledger.EnsureBlock alongside Certificate objects, which are
cryptographic proofs that a Block was confirmed for a given
round.
If Ledger and db.Accessor provide crash-safe storage, agreement
will also recover safely after crashes.
The specification for the protocol implemented by this package is located here.
Optimizations from and other deviations from the spec will be noted throughout this file.
Certain terms in this implementation are used as shorthands for specific concepts:
- "Threshold" and "quorum" both refer to the total weight of votes needed to form a bundle for a given value.
- A "proposal-vote" refers to a vote whose step is "propose"=0.
- A "payload" refers to the body of a proposal which contains, among
other fields, a
Block. - "Freshness" generally refers to the relevance of some message or event. Message relay rules which refer to a round, period, or step may be referred to as freshness rules. Freshness may also be used to describe the relevance of bundles.
- The "frozen" value in a period p refers to the proposal-value in the proposal-vote which was observed by the state machine to have the lowest credential (i.e., mu(S, r, p)).
- The "staging" value in a period p refers to the proposal-value which received a quorum of soft votes (i.e., sigma(S, r, p)).
- Additional terminology is described in the agreement service doc.
At the top level, an agreement.Service encapsulates the parameters
and the goroutines which execute the protocol.
Our implementation divides its tasks into two components: a concurrent component, which communicates with the network, disk, and timeouts, and performs expensive CPU operations, and a serialized state machine, which executes protocol logic and makes decisions on what abstract actions to take.
These two components communicate with each other using the
abstractions of events and actions. event objects describe
communication from the concurrent component to the state machine,
encapsulating external input to the machine. For each input event,
the state machine emits an output []action, which lists a sequence
of control operations and operations which are potentially observable
by the outside universe. Communication occurs between these components
through a pair of Go channels in service.demuxLoop and
service.mainLoop.
events are also used for internal communication between components
of the state machine.
The concurrent and serialized components share a variety of static
data structures, such as vote, bundle, and proposal. These data
types have distinct unauthenticated versions, which allows routines to
specify that they accept untrusted input.
The demux object demultiplexes over a variety of channels which all
represent inputs to the system. Inputs include:
- Receiving a message from the
Network - Receiving a timeout from the
Clock - Notification to stop waiting for a block in the current
Round(Ledger.Wait) - Authenticated messages from the
cryptoVerifier
The cryptoVerifier parallelizes expensive cryptographic operations
such as authentication of votes, bundles, and proposal payloads so as
to maximize CPU core utilization.
A special case of the node behavior is encapsulated in the
pseudonode: to increase ease of testing and to minimize code
duplication, the state machine produces votes and proposals by
directing the pseudonode to create them from participation key
data. The pseudonode then directs the votes and proposals back to the
state machine, as if they arrived from some external source. The state
machine validates these messages in the same way it validates real
network messages and relays them back into the network as
appropriate.
Because signature verification is expected to be a computational bottleneck in the agreement code, it executes concurrently with respect to the state machine. As a result, the relay rules described in the specification of the protocol are altered slightly: for instance, properties of messages are checked twice when determining whether to ignore them: once before cryptographic verification, and once after cryptographic verification. Checking before cryptographic verification is not strictly necessary but is an optimization that reduces CPU costs in non-adversarial cases; e.g., duplicate votes are discarded before signature verification is attempted.
In the specification, each participant in the agreement protocol is
associated with a single set of keys. In practice, nodes may wish to
participate on behalf of many keys simultaneously. These keys are
encapsulated in the KeyManager, and the pseudonode allows the
agreement protocol to multiplex across these keys. For all intents
and purposes, these keys may be modelled as distinct participants
which all exhibit identical behavior, and whose messages are all
serialized through a single participant. For instance, if two keys
are both selected to propose a block, then this node may or may not
transmit the block belonging to the key with the lower-priority
credential.
This implementation thus extends the set of "external" events handled
by the state machine. In addition to handling timeouts and network
events, the state machine thus also handles concurrent writes to the
Ledger via Wait, and it handles the output of cryptographic
verification concurrently. Moreover, the implementation abstracts
over multiple keys by generating synthetic network events from the
pseudonode.
The logic of the agreement protocol is implemented as a state machine.
This state machine is composed of many smaller state machines, which
represent different subtasks required by the protocol. All state
machines communicate with each other by sending events to each other
and by receiving events as replies (except for player, which is at
the root of the state machine tree). This communication takes place
through the router, which relays messages between machines. After it
receives a message, a state machine will handle it, producing an
event in response and possibly updating its own state.
For the router to route information correctly, all instances of all
state machines must be uniquely identified for the router. Every
type of state machine corresponds to a unique stateMachineTag
(<machine>.T()). Certain state machines have many instances; for
instance, there is one machine which tracks votes for each step, and
there is one machine which tracks proposals for each period. These
instances are distinguished from each other by a
(round, period, step)-triplet. For a given type of state machine, the
less specific fields in the triplet are ignored: for example, to send
a message to the state machine handling proposals in round 100,
period 2, both (100, 2, 0) and (100, 2, 6) identify this instance,
since it handles messages for any step in period 2.
State machines are arranged hierarchically in the state machine
tree. At the top of the tree is the player, which has two
children: the root vote machine and the root proposal machine. The
vote machines and the proposal machines are both hierarchically
arranged first by round, then by period, and finally by step. Thus
the hierarchy is as follows:
- player
- vote
- vote round 0
- vote (round, period) (0, 0)
- vote (round, period, step) (0, 0, 0)
- vote (round, period, step) (0, 0, 1)
- ...
- vote (round, period) (0, 1)
- ...
- vote (round, period) (0, 0)
- vote round 1
- ...
- vote round 0
- proposal
- proposal round 0
- proposal (round, period) (0, 0)
- proposal (round, period, step) (0, 0, 0)
- proposal (round, period, step) (0, 0, 1)
- ...
- proposal (round, period) (0, 1)
- ...
- proposal (round, period) (0, 0)
- proposal round 1
- ...
A state machine in the hierarchy can deliver events and queries to any
of its children but not its parents. All state machines also receive
a read-only copy of the
playerstate when receiving any event.
- proposal round 0
- vote
State machines may be wrapped in Contracts which specify pre- and
post-conditions for events received and emitted by a state machine.
These contracts may be checked at runtime to find violations and
possible bugs.
The tracer records the path of messages as they travel through the
state machine for debugging, inspection, and post-mortem
functionality.
The root of the state machine tree is the player machine. This
machine holds the current round, period, and step of the node,
as well as some metadata to ensure correct propagation of received
messages. All events are first routed to the player, which may
forward them to other state machines. player is special in two ways:
first of all, it is an event actor, which means that its handle
method emits []action, and second, it is passed (by value) to all
children state machines so that they are aware of the current state of
the node.
The player consumes messageEvents which it forwards to the
appropriate state machines. bundles and non-proposal-votes (i.e.,
votes with step =/= 0) are forwarded to the vote threshold machines,
while proposalPayloads and proposal-votes (i.e. votes with step =
0) are forwarded to the proposal machines. Based on their outputs, the
player chooses to relay or ignore these messages.
The player consumes timeouts events. The player communicates
its next timeout by setting its Deadline and Napping fields.
The player produces actions which transmit messages to the
pseudonode (e.g., attest to votes, assemble Blocks,
repropose proposals) when appropriate. The player may issue
queries to determine how to vote; for instance, the player will
ask the proposal machine whether a Block is "committable" i.e.,
whether the entire Block has arrived along with a soft-threshold for
that digest.
The player changes the round and period of the node according to
thresholdEvents it receives from the vote threshold machine (and
also potentially upon receiving a roundInterruptionEvent). It
changes step according to timeout events it has received. On
conclusion of a round, the player queries the Block from the
proposal machine and then writes them along with the Certificate to
the Ledger.
The remaining state machines are subordinate to player and handle
two broad kinds of functonality: detecting when a threshold of votes
has been reached, and managing Block proposals. The voteMachines
create thresholdEvents and gives them to the player machine, while
the proposalMachines track proposals and reconstruct them from
network messages.
In the spec of the agreement protocol, messages are delivered on an ordered but best-effort basis. This means that the agreement protocol is resilient to message reordering in the network layer. However, good ordering improves the liveness of the agreement protocol and improves the rate at which the protocol converges.
One ordering constraint which impacts performance and also test reliability is the ordering of proposal-votes with respect to their matching proposal payloads. If the proposal payload is received before its corresponding proposal-vote, the agreement protocol will drop the payload and must recover into a new period.
The introduction of a concurrent cryptographic verification pool exacerbates this problem: a received proposal-vote will enter cryptographic verification before any state changes. If the corresponding payload arrives before cryptographic verification finishes, which is likely on a fast network or on a machine with a loaded CPU, the node will drop the payload and must again recover into a new period.
As a result, the implementation bundles together matching
proposal-votes and proposal payloads into a compoundMessage. Nodes
process a compoundMessage by first processing the proposal-vote (if
it exists) and then following by processing the payload afterwards.
To retain a handle on the proposal payload associated with a
proposal-vote which is sent into the cryptoVerifier, the player
maintains a proposalTable which associates outstanding proposal-vote
verification requests with their corresponding payload.
The vote threshold machines convert votes and bundles into events which signal that a quorum of valid votes has formed for some step and for some value.
At the root level, the voteAggregator receives raw and authenticated
network messages that hold votes and bundles. It performs basic
duplicate filtering and then forwards these messages to
voteTrackerRound machines.
In addition to forwarding these messages to voteTrackerPeriod
machines, voteTrackerRound machines also hold the freshest
threshold event which they have seen. This serves two purposes: on
arriving in a new period, the player must process any threshold
event which has been pipelined, and during partitions, these events
hold bundles which the node must propagate to neighbors.
voteTrackerPeriod machines respond to queries for the thresholds
which have been observed by the node in some period.
voteTracker machines implement the core vote counting logic for a
given step. Whenever the number of votes passes the threshold
required for that step, the machine generates the threshold event
exactly once and then returns the event up the state machine tree. It
also records duplicate votes received in a given period, up to sender
equivocation.
The proposal management machines track and store Block proposals
across periods as the node executes the Byzantine Agreement protocol.
The proposalManager receives raw and authenticated proposalPayload
messages. It also receives raw and authenticated votes for which
step = 0 (= propose). These special "proposal-votes" represent the
proposal messages Block proposers send at the beginning of a round
and propagate in the network independently from the Blocks
themselves, which are contained in proposalPayload messages.
The proposalManager performs duplicate filtering and then forwards
messages as appropriate. It also issues the control messages required
to change round and period.
The proposalStore tracks the set of proposal-values and payloads
which are relevant to a given round. It also maintains the correct
setting of the pinned proposal-value. When a new period arrives, it
garbage-collects old proposals and updates the pinned value as
necessary. When a new round arrives, it returns any enqueued payload
events as necessary.
The proposalTracker maintains two proposal-values: one corresponding
to the lowest proposal-credential seen in a period (the frozen
proposal-value), and one corresponding to the sole value for which a
quorum of soft-votes has been observed in the period (the staging
proposal-value). It also records duplicate proposal-votes received in
a given period.
The staging slot for a given period is important because its state is
the precursor to cert and next votes. Once both a soft threshold for a
value and the Block corresponding to this value has been observed by
the node, a proposalCommittableEvent is emitted, which indicates
that the node may cert or next-vote for the proposal.