|
| 1 | +This package provides an auxiliary service that asynchronously processes state diff objects from chain events, |
| 2 | +either relaying the state objects to rpc subscribers or writing them directly to Postgres. |
| 3 | + |
| 4 | +It also exposes RPC endpoints for fetching or writing to Postgres the state diff `StateObject` at a specific block height |
| 5 | +or for a specific block hash, this operates on historic block and state data and so is dependent on having a complete state archive. |
| 6 | + |
| 7 | +# Statediff Object |
| 8 | +A state diff `StateObject` is the collection of all the state and storage trie nodes that have been updated in a given block. |
| 9 | +For convenience, we also associate these nodes with the block number and hash, and optionally the set of code hashes and code for any |
| 10 | +contracts deployed in this block. |
| 11 | + |
| 12 | +A complete state diff `StateObject` will include all state and storage intermediate nodes, which is necessary for generating proofs and for |
| 13 | +traversing the tries. |
| 14 | + |
| 15 | +```go |
| 16 | +// StateObject is a collection of state (and linked storage nodes) as well as the associated block number, block hash, |
| 17 | +// and a set of code hashes and their code |
| 18 | +type StateObject struct { |
| 19 | + BlockNumber *big.Int `json:"blockNumber" gencodec:"required"` |
| 20 | + BlockHash common.Hash `json:"blockHash" gencodec:"required"` |
| 21 | + Nodes []StateNode `json:"nodes" gencodec:"required"` |
| 22 | + CodeAndCodeHashes []CodeAndCodeHash `json:"codeMapping"` |
| 23 | +} |
| 24 | + |
| 25 | +// StateNode holds the data for a single state diff node |
| 26 | +type StateNode struct { |
| 27 | + NodeType NodeType `json:"nodeType" gencodec:"required"` |
| 28 | + Path []byte `json:"path" gencodec:"required"` |
| 29 | + NodeValue []byte `json:"value" gencodec:"required"` |
| 30 | + StorageNodes []StorageNode `json:"storage"` |
| 31 | + LeafKey []byte `json:"leafKey"` |
| 32 | +} |
| 33 | + |
| 34 | +// StorageNode holds the data for a single storage diff node |
| 35 | +type StorageNode struct { |
| 36 | + NodeType NodeType `json:"nodeType" gencodec:"required"` |
| 37 | + Path []byte `json:"path" gencodec:"required"` |
| 38 | + NodeValue []byte `json:"value" gencodec:"required"` |
| 39 | + LeafKey []byte `json:"leafKey"` |
| 40 | +} |
| 41 | + |
| 42 | +// CodeAndCodeHash struct for holding codehash => code mappings |
| 43 | +// we can't use an actual map because they are not rlp serializable |
| 44 | +type CodeAndCodeHash struct { |
| 45 | + Hash common.Hash `json:"codeHash"` |
| 46 | + Code []byte `json:"code"` |
| 47 | +} |
| 48 | +``` |
| 49 | +These objects are packed into a `Payload` structure which additionally associates the StateObject |
| 50 | +with the block (header, uncles, and transactions), receipts, and total difficulty. |
| 51 | +This `Payload` encapsulates all the block and state data at a given block, and allows us to index the entire Ethereum data structure |
| 52 | +as hash-linked IPLD objects. |
| 53 | + |
| 54 | +```go |
| 55 | +// Payload packages the data to send to state diff subscriptions |
| 56 | +type Payload struct { |
| 57 | + BlockRlp []byte `json:"blockRlp"` |
| 58 | + TotalDifficulty *big.Int `json:"totalDifficulty"` |
| 59 | + ReceiptsRlp []byte `json:"receiptsRlp"` |
| 60 | + StateObjectRlp []byte `json:"stateObjectRlp" gencodec:"required"` |
| 61 | + |
| 62 | + encoded []byte |
| 63 | + err error |
| 64 | +} |
| 65 | +``` |
| 66 | + |
| 67 | +# Usage |
| 68 | +This state diffing service runs as an auxiliary service concurrent to the regular syncing process of the geth node. |
| 69 | + |
| 70 | + |
| 71 | +## CLI configuration |
| 72 | +This service introduces a CLI flag namespace `statediff` |
| 73 | + |
| 74 | +`--statediff` flag is used to turn on the service |
| 75 | +`--statediff.writing` is used to tell the service to write state diff objects it produces from synced ChainEvents directly to a configured Postgres database |
| 76 | +`--statediff.db` is the connection string for the Postgres database to write to |
| 77 | +`--statediff.dbnodeid` is the node id to use in the Postgres database |
| 78 | +`--statediff.dbclientname` is the client name to use in the Postgres database |
| 79 | + |
| 80 | +The service can only operate in full sync mode (`--syncmode=full`), but only the historical RPC endpoints require an archive node (`--gcmode=archive`) |
| 81 | + |
| 82 | +e.g. |
| 83 | +` |
| 84 | +./build/bin/geth --syncmode=full --gcmode=archive --statediff --statediff.writing --statediff.db=postgres://localhost:5432/vulcanize_testing?sslmode=disable --statediff.dbnodeid={nodeId} --statediff.dbclientname={dbClientName} |
| 85 | +` |
| 86 | + |
| 87 | +## RPC endpoints |
| 88 | +The state diffing service exposes both a WS subscription endpoint, and a number of HTTP unary endpoints. |
| 89 | + |
| 90 | +Each of these endpoints requires a set of parameters provided by the caller |
| 91 | + |
| 92 | +```go |
| 93 | +// Params is used to carry in parameters from subscribing/requesting clients configuration |
| 94 | +type Params struct { |
| 95 | + IntermediateStateNodes bool |
| 96 | + IntermediateStorageNodes bool |
| 97 | + IncludeBlock bool |
| 98 | + IncludeReceipts bool |
| 99 | + IncludeTD bool |
| 100 | + IncludeCode bool |
| 101 | + WatchedAddresses []common.Address |
| 102 | + WatchedStorageSlots []common.Hash |
| 103 | +} |
| 104 | +``` |
| 105 | + |
| 106 | +Using these params we can tell the service whether to include state and/or storage intermediate nodes; whether |
| 107 | +to include the associated block (header, uncles, and transactions); whether to include the associated receipts; |
| 108 | +whether to include the total difficult for this block; whether to include the set of code hashes and code for |
| 109 | +contracts deployed in this block; whether to limit the diffing process to a list of specific addresses; and/or |
| 110 | +whether to limit the diffing process to a list of specific storage slot keys. |
| 111 | + |
| 112 | +### Subscription endpoint |
| 113 | +A websocket supporting RPC endpoint is exposed for subscribing to state diff `StateObjects` that come off the head of the chain while the geth node syncs. |
| 114 | + |
| 115 | +```go |
| 116 | +// Stream is a subscription endpoint that fires off state diff payloads as they are created |
| 117 | +Stream(ctx context.Context, params Params) (*rpc.Subscription, error) |
| 118 | +``` |
| 119 | + |
| 120 | +To expose this endpoint the node needs to have the websocket server turned on (`--ws`), |
| 121 | +and the `statediff` namespace exposed (`--ws.api=statediff`). |
| 122 | + |
| 123 | +Go code subscriptions to this endpoint can be created using the `rpc.Client.Subscribe()` method, |
| 124 | +with the "statediff" namespace, a `statediff.Payload` channel, and the name of the statediff api's rpc method: "stream". |
| 125 | + |
| 126 | +e.g. |
| 127 | + |
| 128 | +```go |
| 129 | + |
| 130 | +cli, err := rpc.Dial("ipcPathOrWsURL") |
| 131 | +if err != nil { |
| 132 | + // handle error |
| 133 | +} |
| 134 | +stateDiffPayloadChan := make(chan statediff.Payload, 20000) |
| 135 | +methodName := "stream" |
| 136 | +params := statediff.Params{ |
| 137 | + IncludeBlock: true, |
| 138 | + IncludeTD: true, |
| 139 | + IncludeReceipts: true, |
| 140 | + IntermediateStorageNodes: true, |
| 141 | + IntermediateStateNodes: true, |
| 142 | +} |
| 143 | +rpcSub, err := cli.Subscribe(context.Background(), statediff.APIName, stateDiffPayloadChan, methodName, params) |
| 144 | +if err != nil { |
| 145 | + // handle error |
| 146 | +} |
| 147 | +for { |
| 148 | + select { |
| 149 | + case stateDiffPayload := <- stateDiffPayloadChan: |
| 150 | + // process the payload |
| 151 | + case err := <- rpcSub.Err(): |
| 152 | + // handle rpc subscription error |
| 153 | + } |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +### Unary endpoints |
| 158 | +The service also exposes unary RPC endpoints for retrieving the state diff `StateObject` for a specific block height/hash. |
| 159 | +```go |
| 160 | +// StateDiffAt returns a state diff payload at the specific blockheight |
| 161 | +StateDiffAt(ctx context.Context, blockNumber uint64, params Params) (*Payload, error) |
| 162 | + |
| 163 | +// StateDiffFor returns a state diff payload for the specific blockhash |
| 164 | +StateDiffFor(ctx context.Context, blockHash common.Hash, params Params) (*Payload, error) |
| 165 | +``` |
| 166 | + |
| 167 | +To expose this endpoint the node needs to have the HTTP server turned on (`--http`), |
| 168 | +and the `statediff` namespace exposed (`--http.api=statediff`). |
| 169 | + |
| 170 | +## Direct indexing into Postgres |
| 171 | +If `--statediff.writing` is set, the service will convert the state diff `StateObject` data into IPLD objects, persist them directly to Postgres, |
| 172 | +and generate secondary indexes around the IPLD data. |
| 173 | + |
| 174 | +The schema and migrations for this Postgres database are provided in `statediff/db/`. |
| 175 | + |
| 176 | +### Postgres setup |
| 177 | +We use [pressly/goose](https://github.com/pressly/goose) as our Postgres migration manager. |
| 178 | +You can also load the Postgres schema directly into a database using |
| 179 | + |
| 180 | +`psql database_name < schema.sql` |
| 181 | + |
| 182 | +This will only work on a version 12.4 Postgres database. |
| 183 | + |
| 184 | +### Schema overview |
| 185 | +Our Postgres schemas are built around a single IPFS backing Postgres IPLD blockstore table (`public.blocks`) that conforms with [go-ds-sql](https://github.com/ipfs/go-ds-sql/blob/master/postgres/postgres.go). |
| 186 | +All IPLD objects are stored in this table, where `key` is blockstore-prefixed multihash key for the IPLD object and `data` contains |
| 187 | +the bytes for the IPLD block (in the case of all Ethereum IPLDs, this is the RLP byte encoding of the Ethereum object). |
| 188 | + |
| 189 | +The IPLD objects in this table can be traversed using an IPLD DAG interface, but since this table only maps multihash to raw IPLD object |
| 190 | +it is not particularly useful for searching through the data or and does not allow us to look up Ethereum objects by their constituent fields |
| 191 | +(e.g. by block number, tx source/recipient, state/storage trie node path). To improve the accessibility of these Ethereum IPLD objects |
| 192 | +we generate secondary indexes on top of the raw IPLDs in other Postgres tables. This collection of tables encapsulates an Ethereum [advanced data layout](https://github.com/ipld/specs#schemas-and-advanced-data-layouts) (ADL). |
| 193 | + |
| 194 | +These secondary index tables fall under the `eth` schema and follow an `{objectType}_cids` naming convention. |
| 195 | +Each of these tables provides a view into the individual fields of the underlying Ethereum IPLD object and references the raw IPLD object stored in `public.blocks` by multihash foreign key. |
| 196 | +Additionally, these tables link up to their parent object tables. E.g. the `storage_cids` table contains a `state_id` foreign key which references the `id` |
| 197 | +for the `state_cids` entry that contains the state leaf node for the contract the storage node belongs to, and in turn that `state_cids` entry contains a `header_id` |
| 198 | +foreign key which references the `id` of the `header_cids` entry that contains the header for the block these state and storage nodes were updated (diffed). |
| 199 | + |
| 200 | +## Optimization |
| 201 | +On mainnet this process is extremely IO intensive and requires significant resources to allow it to keep up with the head of the chain. |
| 202 | +The state diff processing time for a specific block is dependent on the number and complexity of the state changes that occur in a block and |
| 203 | +the number of updated state nodes that are available in the in-memory cache vs must be retrieved from disc. |
| 204 | + |
| 205 | +If memory permits, one means of improving the efficiency of this process is to increase the trie cache allocation. |
| 206 | +This can be done by increasing the overall `--cache` allocation and/or by increasing the % of the cache allocated to trie |
| 207 | +usage with `--cache.trie`. |
0 commit comments