Skip to content

Commit d65fbe4

Browse files
committed
touching up
1 parent 1328dad commit d65fbe4

File tree

1 file changed

+44
-36
lines changed

1 file changed

+44
-36
lines changed

statediff/doc.md

Lines changed: 44 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
1+
# Statediff
2+
13
This package provides an auxiliary service that asynchronously processes state diff objects from chain events,
2-
either relaying the state objects to rpc subscribers or writing them directly to Postgres.
4+
either relaying the state objects to RPC subscribers or writing them directly to Postgres as IPLD objects.
5+
6+
It also exposes RPC endpoints for fetching or writing to Postgres the state diff at a specific block height
7+
or for a specific block hash, this operates on historical block and state data and so depends on a complete state archive.
38

4-
It also exposes RPC endpoints for fetching or writing to Postgres the state diff `StateObject` at a specific block height
5-
or for a specific block hash, this operates on historic block and state data and so is dependent on having a complete state archive.
9+
Data is emitted in this differential format in order to make it feasible to IPLD-ize and index the *entire* Ethereum state
10+
(including intermediate state and storage trie nodes). If this state diff process is ran continuously from genesis,
11+
the entire state at any block can be materialized from the cumulative differentials up to that point.
612

7-
# Statediff Object
13+
## Statediff object
814
A state diff `StateObject` is the collection of all the state and storage trie nodes that have been updated in a given block.
915
For convenience, we also associate these nodes with the block number and hash, and optionally the set of code hashes and code for any
1016
contracts deployed in this block.
@@ -24,31 +30,31 @@ type StateObject struct {
2430

2531
// StateNode holds the data for a single state diff node
2632
type StateNode struct {
27-
NodeType NodeType `json:"nodeType" gencodec:"required"`
28-
Path []byte `json:"path" gencodec:"required"`
29-
NodeValue []byte `json:"value" gencodec:"required"`
30-
StorageNodes []StorageNode `json:"storage"`
31-
LeafKey []byte `json:"leafKey"`
33+
NodeType NodeType `json:"nodeType" gencodec:"required"`
34+
Path []byte `json:"path" gencodec:"required"`
35+
NodeValue []byte `json:"value" gencodec:"required"`
36+
StorageNodes []StorageNode `json:"storage"`
37+
LeafKey []byte `json:"leafKey"`
3238
}
3339

3440
// StorageNode holds the data for a single storage diff node
3541
type StorageNode struct {
36-
NodeType NodeType `json:"nodeType" gencodec:"required"`
37-
Path []byte `json:"path" gencodec:"required"`
38-
NodeValue []byte `json:"value" gencodec:"required"`
39-
LeafKey []byte `json:"leafKey"`
42+
NodeType NodeType `json:"nodeType" gencodec:"required"`
43+
Path []byte `json:"path" gencodec:"required"`
44+
NodeValue []byte `json:"value" gencodec:"required"`
45+
LeafKey []byte `json:"leafKey"`
4046
}
4147

4248
// CodeAndCodeHash struct for holding codehash => code mappings
4349
// we can't use an actual map because they are not rlp serializable
4450
type CodeAndCodeHash struct {
45-
Hash common.Hash `json:"codeHash"`
46-
Code []byte `json:"code"`
51+
Hash common.Hash `json:"codeHash"`
52+
Code []byte `json:"code"`
4753
}
4854
```
49-
These objects are packed into a `Payload` structure which additionally associates the StateObject
55+
These objects are packed into a `Payload` structure which can additionally associate the `StateObject`
5056
with the block (header, uncles, and transactions), receipts, and total difficulty.
51-
This `Payload` encapsulates all the block and state data at a given block, and allows us to index the entire Ethereum data structure
57+
This `Payload` encapsulates all of the differential data at a given block, and allows us to index the entire Ethereum data structure
5258
as hash-linked IPLD objects.
5359

5460
```go
@@ -64,11 +70,11 @@ type Payload struct {
6470
}
6571
```
6672

67-
# Usage
73+
## Usage
6874
This state diffing service runs as an auxiliary service concurrent to the regular syncing process of the geth node.
6975

7076

71-
## CLI configuration
77+
### CLI configuration
7278
This service introduces a CLI flag namespace `statediff`
7379

7480
`--statediff` flag is used to turn on the service
@@ -84,7 +90,7 @@ e.g.
8490
./build/bin/geth --syncmode=full --gcmode=archive --statediff --statediff.writing --statediff.db=postgres://localhost:5432/vulcanize_testing?sslmode=disable --statediff.dbnodeid={nodeId} --statediff.dbclientname={dbClientName}
8591
`
8692

87-
## RPC endpoints
93+
### RPC endpoints
8894
The state diffing service exposes both a WS subscription endpoint, and a number of HTTP unary endpoints.
8995

9096
Each of these endpoints requires a set of parameters provided by the caller
@@ -105,11 +111,11 @@ type Params struct {
105111

106112
Using these params we can tell the service whether to include state and/or storage intermediate nodes; whether
107113
to include the associated block (header, uncles, and transactions); whether to include the associated receipts;
108-
whether to include the total difficult for this block; whether to include the set of code hashes and code for
114+
whether to include the total difficulty for this block; whether to include the set of code hashes and code for
109115
contracts deployed in this block; whether to limit the diffing process to a list of specific addresses; and/or
110116
whether to limit the diffing process to a list of specific storage slot keys.
111117

112-
### Subscription endpoint
118+
#### Subscription endpoint
113119
A websocket supporting RPC endpoint is exposed for subscribing to state diff `StateObjects` that come off the head of the chain while the geth node syncs.
114120

115121
```go
@@ -154,7 +160,7 @@ for {
154160
}
155161
```
156162

157-
### Unary endpoints
163+
#### Unary endpoints
158164
The service also exposes unary RPC endpoints for retrieving the state diff `StateObject` for a specific block height/hash.
159165
```go
160166
// StateDiffAt returns a state diff payload at the specific blockheight
@@ -167,41 +173,43 @@ StateDiffFor(ctx context.Context, blockHash common.Hash, params Params) (*Payloa
167173
To expose this endpoint the node needs to have the HTTP server turned on (`--http`),
168174
and the `statediff` namespace exposed (`--http.api=statediff`).
169175

170-
## Direct indexing into Postgres
176+
### Direct indexing into Postgres
171177
If `--statediff.writing` is set, the service will convert the state diff `StateObject` data into IPLD objects, persist them directly to Postgres,
172178
and generate secondary indexes around the IPLD data.
173179

174180
The schema and migrations for this Postgres database are provided in `statediff/db/`.
175181

176-
### Postgres setup
182+
#### Postgres setup
177183
We use [pressly/goose](https://github.com/pressly/goose) as our Postgres migration manager.
178184
You can also load the Postgres schema directly into a database using
179185

180186
`psql database_name < schema.sql`
181187

182188
This will only work on a version 12.4 Postgres database.
183189

184-
### Schema overview
190+
#### Schema overview
185191
Our Postgres schemas are built around a single IPFS backing Postgres IPLD blockstore table (`public.blocks`) that conforms with [go-ds-sql](https://github.com/ipfs/go-ds-sql/blob/master/postgres/postgres.go).
186-
All IPLD objects are stored in this table, where `key` is blockstore-prefixed multihash key for the IPLD object and `data` contains
192+
All IPLD objects are stored in this table, where `key` is the blockstore-prefixed multihash key for the IPLD object and `data` contains
187193
the bytes for the IPLD block (in the case of all Ethereum IPLDs, this is the RLP byte encoding of the Ethereum object).
188194

189195
The IPLD objects in this table can be traversed using an IPLD DAG interface, but since this table only maps multihash to raw IPLD object
190-
it is not particularly useful for searching through the data or and does not allow us to look up Ethereum objects by their constituent fields
191-
(e.g. by block number, tx source/recipient, state/storage trie node path). To improve the accessibility of these Ethereum IPLD objects
192-
we generate secondary indexes on top of the raw IPLDs in other Postgres tables. This collection of tables encapsulates an Ethereum [advanced data layout](https://github.com/ipld/specs#schemas-and-advanced-data-layouts) (ADL).
196+
it is not particularly useful for searching through the data by looking up Ethereum objects by their constituent fields
197+
(e.g. by block number, tx source/recipient, state/storage trie node path). To improve the accessibility of these objects
198+
we create an Ethereum [advanced data layout](https://github.com/ipld/specs#schemas-and-advanced-data-layouts) (ADL) by generating secondary
199+
indexes on top of the raw IPLDs in other Postgres tables.
193200

194201
These secondary index tables fall under the `eth` schema and follow an `{objectType}_cids` naming convention.
195-
Each of these tables provides a view into the individual fields of the underlying Ethereum IPLD object and references the raw IPLD object stored in `public.blocks` by multihash foreign key.
196-
Additionally, these tables link up to their parent object tables. E.g. the `storage_cids` table contains a `state_id` foreign key which references the `id`
197-
for the `state_cids` entry that contains the state leaf node for the contract the storage node belongs to, and in turn that `state_cids` entry contains a `header_id`
198-
foreign key which references the `id` of the `header_cids` entry that contains the header for the block these state and storage nodes were updated (diffed).
202+
These tables provide a view into individual fields of the underlying Ethereum IPLD objects, allowing lookups on these fields, and reference the raw IPLD objects stored in `public.blocks`
203+
by foreign keys to their multihash keys.
204+
Additionally, these tables maintain the hash-linked nature of Ethereum objects to one another. E.g. a storage trie node entry in the `storage_cids`
205+
table contains a `state_id` foreign key which references the `id` for the `state_cids` entry that contains the state leaf node for the contract that storage node belongs to,
206+
and in turn that `state_cids` entry contains a `header_id` foreign key which references the `id` of the `header_cids` entry that contains the header for the block these state and storage nodes were updated (diffed).
199207

200-
## Optimization
208+
### Optimization
201209
On mainnet this process is extremely IO intensive and requires significant resources to allow it to keep up with the head of the chain.
202210
The state diff processing time for a specific block is dependent on the number and complexity of the state changes that occur in a block and
203211
the number of updated state nodes that are available in the in-memory cache vs must be retrieved from disc.
204212

205-
If memory permits, one means of improving the efficiency of this process is to increase the trie cache allocation.
213+
If memory permits, one means of improving the efficiency of this process is to increase the in-memory trie cache allocation.
206214
This can be done by increasing the overall `--cache` allocation and/or by increasing the % of the cache allocated to trie
207215
usage with `--cache.trie`.

0 commit comments

Comments
 (0)