Skip to content

Commit f396c12

Browse files
committed
add doc about StorageKey encoding
1 parent 2b76ae0 commit f396c12

File tree

1 file changed

+154
-0
lines changed

1 file changed

+154
-0
lines changed

docs/design/state-trie.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# State Trie
2+
3+
State Trie is used in blockchain networks to store the entire world state, typically organized using MPT (Merkle Patricia Trie) data structures. In Ethereum, all account basic states (balance, nonce, code_hash, storage_root) are stored in the leaf nodes of the state tree, with each contract account's storage data stored separately using an MPT.
4+
5+
Conflux's state storage method differs from Ethereum in several ways:
6+
7+
1. Account basic states and contract storage data are stored in a single MPT tree.
8+
2. Core Space accounts and eSpace accounts are also stored in the same MPT tree.
9+
3. Additionally, the MPT tree stores Core Space account VoteList data and DepositList data (currently no longer used).
10+
4. Contract account Code is also stored in the MPT tree.
11+
12+
In summary, Conflux uses one massive MPT to store all global state data, including account basic information, code, storage, VoteList, and DepositList.
13+
14+
## StorageKey
15+
16+
The core functionality of MPT trees is to support key/value storage and retrieval. Conflux actually implements storing different types of data in the same MPT through different encoding rules.
17+
The StorageKey data type is defined as follows:
18+
19+
```rust
20+
pub enum StorageKey<'a> {
21+
AccountKey(&'a [u8]),
22+
StorageRootKey(&'a [u8]),
23+
StorageKey {
24+
address_bytes: &'a [u8],
25+
storage_key: &'a [u8],
26+
},
27+
CodeRootKey(&'a [u8]),
28+
CodeKey {
29+
address_bytes: &'a [u8],
30+
code_hash_bytes: &'a [u8],
31+
},
32+
DepositListKey(&'a [u8]),
33+
VoteListKey(&'a [u8]),
34+
}
35+
```
36+
37+
The main data in each of the above keys is the account address bytes array. Different types of keys can be encoded into different MPT keys and used to store different data:
38+
39+
- AccountKey: Used to store account basic information such as nonce, balance, code_hash, etc.
40+
- StorageRootKey: Used to store StorageLayout information, currently has no practical use
41+
- StorageKey: Used to store contract storage data
42+
- CodeRootKey: Used to store contract account code hash
43+
- CodeKey: Used to store contract account code data
44+
- DepositListKey: Used to store Core Space account DepositList information (currently no longer used)
45+
- VoteListKey: Used to store Core Space account VoteList information (currently no longer used)
46+
47+
### Encoding
48+
49+
Assuming there's an account address `0x8fb79782e14c082bfbb91692bf071187866007d2`, let's see what different types of keys look like after encoding:
50+
51+
```sh
52+
# AccountKey directly uses the address itself
53+
8fb79782e14c082bfbb91692bf071187866007d2
54+
55+
# StorageRootKey adds b"data"(64617461) after the address
56+
8fb79782e14c082bfbb91692bf071187866007d2 + 64617461
57+
58+
# StorageKey adds b"data" after the address, then adds the contract storage key
59+
# Assuming the storage key is 0000000000000000000000000000000000000000000000000000000000000008
60+
8fb79782e14c082bfbb91692bf071187866007d2 + 64617461 + 0000000000000000000000000000000000000000000000000000000000000008
61+
62+
# CodeRootKey adds b"code"(636f6465) after the address
63+
8fb79782e14c082bfbb91692bf071187866007d2 + 636f6465
64+
65+
# CodeKey adds b"code" after the address, then adds the code hash
66+
# Assuming the code hash is 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
67+
8fb79782e14c082bfbb91692bf071187866007d2 + 636f6465 + 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
68+
69+
# DepositKey adds b"deposit"(6465706f736974) after the address
70+
8fb79782e14c082bfbb91692bf071187866007d2 + 6465706f736974
71+
72+
# VoteListKey adds b"vote"(766f7465) after the address
73+
8fb79782e14c082bfbb91692bf071187866007d2 + 766f7465
74+
```
75+
76+
The above encoding is for Core Space accounts. eSpace is slightly different, specifically inserting b"\x81"(81) after the address bytes:
77+
78+
```sh
79+
# AccountKey
80+
8fb79782e14c082bfbb91692bf071187866007d2 + 81
81+
82+
# StorageRootKey
83+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 64617461
84+
85+
# StorageKey
86+
# Assuming the storage key is 0000000000000000000000000000000000000000000000000000000000000008
87+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 64617461 + 0000000000000000000000000000000000000000000000000000000000000008
88+
89+
# CodeRootKey
90+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 636f6465
91+
92+
# CodeKey
93+
# Assuming the code hash is 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
94+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 636f6465 + 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
95+
96+
# DepositKey
97+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 6465706f736974
98+
99+
# VoteListKey
100+
8fb79782e14c082bfbb91692bf071187866007d2 + 81 + 766f7465
101+
```
102+
103+
For specific encoding implementation, refer to the StorageKeyWithSpace::to_key_bytes method.
104+
105+
## DeltaMpt and IntermediaMpt
106+
107+
In terms of implementation, Conflux's state tree consists of three trees:
108+
109+
1. DeltaMpt: An incremental Merkle Patricia Trie used to store incremental data of state changes.
110+
2. IntermediaMpt: An intermediate state Merkle Patricia Trie that represents intermediate states between snapshots.
111+
3. Snapshot: Data state snapshots.
112+
113+
When accessing certain data states, the overall access flow (hierarchy) is: DeltaMpt (current changes) → IntermediaMpt (intermediate states) → Snapshot (snapshot states).
114+
115+
The encoding method for DeltaMpt and IntermediaMpt keys differs slightly from regular MPT encoding. Overall, it has an additional padding process.
116+
The basic length of regular MPT keys is the account address bytes length of 20, while delta MPT key basic length is 32. The specific method is as follows:
117+
118+
1. First, there's a padding data with length 32.
119+
2. Concatenate the first 12 bits of padding data with address data to form 32 bits.
120+
3. Calculate keccak hash of the result from step 2.
121+
4. Concatenate the first 12 bits of the hash result with the address to form the final basic key.
122+
123+
The encoding method for extended keys is the same as regular keys.
124+
125+
```sh
126+
# AccountKey
127+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81
128+
129+
# StorageRootKey
130+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 64617461
131+
132+
# StorageKey
133+
# Assuming the storage key is 0000000000000000000000000000000000000000000000000000000000000008
134+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 64617461 + 0000000000000000000000000000000000000000000000000000000000000008
135+
136+
# CodeRootKey
137+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 636f6465
138+
139+
# CodeKey
140+
# Assuming the code hash is 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
141+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 636f6465 + 0x405787fa12a823e0f2b7631cc41b3ba8828b3321ca811111fa75cd3aa3bb5acf
142+
143+
# DepositKey
144+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 6465706f736974
145+
146+
# VoteListKey
147+
b41eca2cce25321f5ecf85540888000000000000000000000000000000000004 + 81 + 766f7465
148+
```
149+
150+
## Considerations
151+
152+
1. Following Ethereum's approach by storing account basic data and contract storage data in separate tries would greatly reduce the size of the state trie and significantly speed up traversal.
153+
2. The special flag (0x81) used to distinguish spaces, if placed at the front position, could enable searching only for data from a specific space in prefix search operations, which should also improve search speed.
154+
3. Currently, Conflux's state search method is prefix search, meaning that given 0x01, it can search for all addresses starting with 0x01. Geth and Reth's search method is to find a cursor given an arbitrary address 0x0888000000000000000000000000000000000004, then iterate through all account addresses greater than that address.

0 commit comments

Comments
 (0)