Skip to content

Commit df735f7

Browse files
committed
doc: state-transition invariants
1 parent 65234e4 commit df735f7

File tree

2 files changed

+108
-0
lines changed

2 files changed

+108
-0
lines changed

block.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ func (vm *VM) AcceptBlock(ctx context.Context, b *blocks.Block) error {
4040
s.MarkSettled()
4141
}
4242

43+
// The documentation of invariants and ordering guarantees explicitly states
44+
// that these can happen in any order because they involve different blocks
45+
// entering different states.
4346
vm.last.settled.Store(b.LastSettled())
4447
vm.last.accepted.Store(b)
4548

docs/invariants.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Invariants
2+
3+
## State transitions
4+
5+
This section deals with the in-memory state of a running node, the on-disk persistence of a chain, and how they relate to one another with respect to timing and data equivalence.
6+
The EVM-specific concept of state (e.g. accounts) is orthogonal to the topics in this document.
7+
8+
1. Timing details the ordering guarantees that exist between, for example, in-memory objects and their equivalents on disk.
9+
2. Data equivalence details how in-memory objects can be recovered from disk after a restart.
10+
11+
### Guiding principles
12+
13+
1. Wherever possible, reuse Ethereum-native concepts i.f.f. there is a clear rationale for equivalence.
14+
1. Conversely, introducing a new concept is preferred to ambiguous overloading of an existing one.
15+
2. Timing guarantees SHOULD be sufficient to avoid race conditions if relied upon and, if insufficient, MUST be documented as such.
16+
17+
### Height mapping
18+
19+
#### Background
20+
21+
The `rawdb` package allows for arbitrary blocks to be stored, only coupling height and hash for canonical blocks; i.e. those in the chain history.
22+
The canonical block with the greatest height is typically stored as the "head".
23+
Ethereum consensus has delayed finality and `rawdb` caters for storage of the last-finalized block.
24+
25+
#### Mapping
26+
27+
| SAE | `rawdb` |
28+
| -------- | --------- |
29+
| Accepted | Canonical |
30+
| Executed | Head |
31+
| Settled | Finalized |
32+
33+
#### Rationale
34+
35+
Accepted and canonical are effectively identical concepts, save for nuanced differences between the consensus mechanisms.
36+
37+
From an API perspective, we treat "latest" (i.e. the chain head) as the last-executed block.
38+
Mirroring this on disk allows for simple integration with the upstream API implementations.
39+
40+
Although we don't support the "finalized" block label via APIs (using "safe" for settled blocks), the `rawdb` support is otherwise unused in SAE chains.
41+
As (a) there is no existing, alternative database concept (e.g. `rawdb` support for the latest safe block), and (b) there will be no overloading of concepts (see *Guiding principles*), we reserve `rawdb` functionality intended for the finalized block for the last settled block.
42+
43+
> [!NOTE]
44+
> These also provide an unambigous inverse, allowing for recovery from disk.
45+
46+
### Ordering guarantees
47+
48+
The goal of these guarantees is to avoid race conditions.
49+
These guarantees are not concerned with low-level races typically protected against with locks and atomics, but with the sort that arise due to the interplay of system components like consensus, the execution stream, and APIs.
50+
51+
> [!NOTE]
52+
> This document is the source of truth and if the code differs there is a bug until proven otherwise.
53+
> If it is impossible to correct the bug then this document MUST be updated to reflect reality, and an audit of potential race conditions SHOULD be performed.
54+
55+
#### Definitions
56+
57+
| | |
58+
| ------ | -
59+
| $B_n$ | Block at height $n$
60+
| $A$ | Set of Accepted blocks
61+
| $E$ | Set of Executed blocks
62+
| $S$ | Set of Settled blocks
63+
| $C$ | Arbitrary condition; e.g. $B_n \in E$
64+
| $D(C)$ | Disk artefacts of some $C$
65+
| $M(C)$ | Memory artefacts of some $C$
66+
| $I(C)$ | Internal indicator of some $C$
67+
| $X(C)$ | eXternal indicator of some $C$
68+
| $G \implies P$ | Some condition $G$ guarantees another condition $P$
69+
70+
An internal indicator is any signal of state that can only be accessed by the SAE implementation.
71+
Examples include the last-accepted, -executed, and -settled block pointers.
72+
73+
An external indicator is any signal of state that can be accessed outside of the SAE implementation, even if in the same process.
74+
An example is a chain-head subscription, be it in the same binary or over a websocket API.
75+
76+
#### Guarantees
77+
78+
> [!TIP]
79+
> Guarantees should be thought of as "if I witness the guarantor $G$ then I can assume some prerequisite $P$".
80+
> This must not be confused with cause and effect as it is the *temporal inverse*: if $G$ guarantees $P$ then $t_P < t_G$ i.e. $P$ *happens before* $G$ in Golang terminology.
81+
82+
| Guarantor | Prerequisite | Notes |
83+
| ----------- | ------------------ | ----- |
84+
| $B_n \in S$ | $B_n \in E$ | Settlement after execution
85+
| $B_n \in E$ | $B_n \in A$ | Execution after settlement
86+
| $X(C)$ | $I(C)$ | External indicator after internal indicator
87+
| $I(C)$ | $M(C)$ | Internal indicator after memory
88+
| $M(C)$ | $D(C)$ | Memory after disk
89+
| $D(C)$ | $C$ | Disk after condition
90+
| $f(B_n)$ | $f(B_{n-1})$ | See below
91+
92+
> [!WARNING]
93+
> No atomic guarantees are provided with respect to different blocks entering different states.
94+
> Of note is the lack of atomicity when updating the last-accepted and last-settled internal indicators even though settlement of a block only occurs with acceptance of some later block.
95+
96+
Realisation of any condition $f(\cdot)$ of a block MUST occur after the *same* condition of the parent block.
97+
Importantly, this means that code MAY be batched in a way that interleaves state transitions of *different* types.
98+
99+
For example, all blocks settled by acceptance of another block MAY have their in-memory state updated *before* the last-settled block pointer is changed.
100+
In this case, $M(B_n \in S) \implies M(B_{n-1} \in S)$ (e.g. calls to `Block.MarkSettled()` are strictly ordered) but nothing can be inferred about $I(B_{n-1} \in S)$ until $I(B_n \in S)$ is witnessed (e.g. updating the last-settled pointer MAY occur after all the calls to `MarkSettled()`).
101+
102+
#### Examples
103+
104+
1. If `blocks.Block.Executed()` returns `true` then the block's receipts can be read from the database because $M(B_n \in E) \implies D(B_n \in E)$.
105+
2. If the atomic pointer to the last-executed block (an internal indicator) is at height $\ge n$ then the post-execution state of block $n$ can be opened because $I(B_{k \ge n} \in E) \implies M(B_k \in E) \implies M(B_n \in E)$.

0 commit comments

Comments
 (0)