Skip to content

Commit f84b908

Browse files
elizabethengelmani-norden
authored andcommitted
Statediffing geth
* Write state diff to CSV (#2) * port statediff from https://github.com/jpmorganchase/quorum/blob/9b7fd9af8082795eeeb6863d9746f12b82dd5078/statediff/statediff.go; minor fixes * integrating state diff extracting, building, and persisting into geth processes * work towards persisting created statediffs in ipfs; based off github.com/vulcanize/eth-block-extractor * Add a state diff service * Remove diff extractor from blockchain * Update imports * Move statediff on/off check to geth cmd config * Update starting state diff service * Add debugging logs for creating diff * Add statediff extractor and builder tests and small refactoring * Start to write statediff to a CSV * Restructure statediff directory * Pull CSV publishing methods into their own file * Reformatting due to go fmt * Add gomega to vendor dir * Remove testing focuses * Update statediff tests to use golang test pkg instead of ginkgo - builder_test - extractor_test - publisher_test * Use hexutil.Encode instead of deprecated common.ToHex * Remove OldValue from DiffBigInt and DiffUint64 fields * Update builder test * Remove old storage value from updated accounts * Remove old values from created/deleted accounts * Update publisher to account for only storing current account values * Update service loop and fetching previous block * Update testing - remove statediff ginkgo test suite file - move mocks to their own dir * Updates per go fmt * Updates to tests * Pass statediff mode and path in through cli * Return filename from publisher * Remove some duplication in builder * Remove code field from state diff output this is the contract byte code, and it can still be obtained by querying the db by the codeHash * Consolidate acct diff structs for updated & updated/deleted accts * Include block number in csv filename * Clean up error logging * Cleanup formatting, spelling, etc * Address PR comments * Add contract address and storage value to csv * Refactor accumulating account row in csv publisher * Add DiffStorage struct * Add storage key to csv * Address PR comments * Fix publisher to include rows for accounts that don't have store updates * Update builder test after merging in release/1.8 * Update test contract to include storage on contract intialization - so that we're able to test that storage diffing works for created and deleted accounts (not just updated accounts). * Factor out a common trie iterator method in builder * Apply goimports to statediff * Apply gosimple changes to statediff * Gracefully exit geth command(#4) * Statediff for full node (#6) * Open a trie from the in-memory database * Use a node's LeafKey as an identifier instead of the address It was proving difficult to find look the address up from a given path with a full node (sometimes the value wouldn't exist in the disk db). So, instead, for now we are using the node's LeafKey with is a Keccak256 hash of the address, so if we know the address we can figure out which LeafKey it matches up to. * Make sure that statediff has been processed before pruning * Use blockchain stateCache.OpenTrie for storage diffs * Clean up log lines and remove unnecessary fields from builder * Apply go fmt changes * Add a sleep to the blockchain test * Address PR comments * Address PR comments * refactoring/reorganizing packages * refactoring statediff builder and types and adjusted to relay proofs and paths (still need to make this optional) * refactoring state diff service and adding api which allows for streaming state diff payloads over an rpc websocket subscription * make proofs and paths optional + compress service loop into single for loop (may be missing something here) * option to process intermediate nodes * make state diff rlp serializable * cli parameter to limit statediffing to select account addresses + test * review fixes and fixes for issues ran into in integration * review fixes; proper method signature for api; adjust service so that statediff processing is halted/paused until there is at least one subscriber listening for the results * adjust buffering to improve stability; doc.go; fix notifier err handling * relay receipts with the rest of the data + review fixes/changes * rpc method to get statediff at specific block; requires archival node or the block be within the pruning range * review fixes * fixes after rebase * statediff verison meta * fix linter issues * include total difficulty to the payload * fix state diff builder: emit actual leaf nodes instead of value nodes; diff on the leaf not on the value; emit correct path for intermediate nodes * adjust statediff builder tests to changes and extend to test intermediate nodes; golint * add genesis block to test; handle block 0 in StateDiffAt * rlp files for mainnet blocks 0-3, for tests * builder test on mainnet blocks * common.BytesToHash(path) => crypto.Keaccak256(hash) in builder; BytesToHash produces same hash for e.g. []byte{} and []byte{\x00} - prefix \x00 steps are inconsequential to the hash result * complete tests for early mainnet blocks * diff type for representing deleted accounts * fix builder so that we handle account deletions properly and properly diff storage when an account is moved to a new path; update params * remove cli params; moving them to subscriber defined * remove unneeded bc methods * update service and api; statediffing params are now defined by user through api rather than by service provider by cli * update top level tests * add ability to watch specific storage slots (leaf keys) only * comments; explain logic * update mainnet blocks test * update api_test.go * storage leafkey filter test * cleanup chain maker * adjust chain maker for tests to add an empty account in block1 and switch to EIP-158 afterwards (now we just need to generate enough accounts until one causes the empty account to be touched and removed post-EIP-158 so we can simulate and test that process...); also added 2 new blocks where more contract storage is set and old slots are set to zero so they are removed so we can test that * found an account whose creation causes the empty account to be moved to a new path; this should count as 'touching; the empty account and cause it to be removed according to eip-158... but it doesn't * use new contract in unit tests that has self-destruct ability, so we can test eip-158 since simply moving an account to new path doesn't count as 'touchin' it * handle storage deletions * tests for eip-158 account removal and storage value deletions; there is one edge case left to test where we remove 1 account when only two exist such that the remaining account is moved up and replaces the root branch node * finish testing known edge cases * add endpoint to fetch all state and storage nodes at a given blockheight; useful for generating a recent atate cache/snapshot that we can diff forward from rather than needing to collect all diffs from genesis * test for state trie builder * minor changes/fixes * update version meta * if statediffing is on, lock tries in triedb until the statediffing service signals they are done using them * update version meta * fix mock blockchain; golint; bump patch * increase maxRequestContentLength; bump patch * log the sizes of the state objects we are sending * CI build (#20) * CI: run build on PR and on push to master * CI: debug building geth * CI: fix coping file * CI: fix coping file v2 * CI: temporary upload file to release asset * CI: get release upload_url by tag, upload asset to current relase * CI: fix tag name * fix ci build on statediff_at_anyblock-1.9.11 branch * fix publishing assets in release * bump version meta * use context deadline for timeout in eth_call * collect and emit codehash=>code mappings for state objects * subscription endpoint for retrieving all the codehash=>code mappings that exist at provided height * bump version meta * Implement WriteStateDiffAt * Writes state diffs directly to postgres * Adds CLI flags to configure PG * Refactors builder output with callbacks * Copies refactored postgres handling code from ipld-eth-indexer * rename PostgresCIDWriter.{index->upsert}* * less ambiguous * go.mod update * rm unused * cleanup * output code & codehash iteratively * had to rf some types for this * prometheus metrics output * duplicate recent eth-indexer changes * migrations and metrics... * [wip] prom.Init() here? another CLI flag? * cleanup * tidy & DRY * statediff WriteLoop service + CLI flag * [wip] update test mocks * todo - do something meaningful to test write loop * logging * use geth log * port tests to go testing * drop ginkgo/gomega * fix and cleanup tests * fail before defer statement * delete vendor/ dir * unused * bump version meta * fixes after rebase onto 1.9.23 * bump version meta * fix API registration * bump version meta * use golang 1.15.5 version (#34) * bump version meta; add 0.0.11 branch to actions * bump version meta; update github actions workflows * statediff: refactor metrics * Remove redundant statediff/indexer/prom tooling and use existing prometheus integration. * cleanup * "indexer" namespace for metrics * add reporting loop for db metrics * doc * metrics for statediff stats * metrics namespace/subsystem = statediff/{indexer,service} * statediff: use a worker pool (for direct writes) * fix test * fix chain event subscription * log tweaks * func name * unused import * intermediate chain event channel for metrics * cleanup * bump version meta
1 parent e787272 commit f84b908

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+10967
-35
lines changed

.github/workflows/build.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
name: Docker Build
2+
3+
on: [pull_request]
4+
5+
jobs:
6+
build:
7+
name: Run docker build
8+
runs-on: ubuntu-latest
9+
steps:
10+
- uses: actions/checkout@v2
11+
- name: Run docker build
12+
run: docker build -t vulcanize/go-ethereum -f Dockerfile.amd64 .

.github/workflows/on-master.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Docker Build and publish to Github
2+
3+
on:
4+
push:
5+
branches:
6+
- v1.9.24-statediff
7+
- v1.9.23-statediff
8+
- v1.9.11-statediff
9+
10+
jobs:
11+
build:
12+
name: Run docker build and publish
13+
runs-on: ubuntu-latest
14+
steps:
15+
- uses: actions/checkout@v2
16+
- name: Run docker build
17+
run: docker build -t vulcanize/go-ethereum -f Dockerfile.amd64 .
18+
- name: Get the version
19+
id: vars
20+
run: echo ::set-output name=sha::$(echo ${GITHUB_SHA:0:7})
21+
- name: Tag docker image
22+
run: docker tag vulcanize/go-ethereum docker.pkg.github.com/vulcanize/go-ethereum/go-ethereum:${{steps.vars.outputs.sha}}
23+
- name: Docker Login
24+
run: echo ${{ secrets.GITHUB_TOKEN }} | docker login https://docker.pkg.github.com -u vulcanize --password-stdin
25+
- name: Docker Push
26+
run: docker push docker.pkg.github.com/vulcanize/go-ethereum/go-ethereum:${{steps.vars.outputs.sha}}
27+

.github/workflows/publish.yaml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: Publish geth to release
2+
on:
3+
release:
4+
types: [published]
5+
jobs:
6+
push_to_registries:
7+
name: Publish assets to Release
8+
runs-on: ubuntu-latest
9+
steps:
10+
- name: Get the version
11+
id: vars
12+
run: |
13+
echo ::set-output name=sha::$(echo ${GITHUB_SHA:0:7})
14+
- name: Docker Login to Github Registry
15+
run: echo ${{ secrets.GITHUB_TOKEN }} | docker login https://docker.pkg.github.com -u vulcanize --password-stdin
16+
- name: Docker Pull
17+
run: docker pull docker.pkg.github.com/vulcanize/go-ethereum/go-ethereum:${{steps.vars.outputs.sha}}
18+
- name: Copy ethereum binary file
19+
run: docker run --rm --entrypoint cat docker.pkg.github.com/vulcanize/go-ethereum/go-ethereum:${{steps.vars.outputs.sha}} /go-ethereum/build/bin/geth > geth-linux-amd64
20+
- name: Get release
21+
id: get_release
22+
uses: bruceadams/[email protected]
23+
env:
24+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
25+
- name: Upload Release Asset
26+
id: upload-release-asset
27+
uses: actions/upload-release-asset@v1
28+
env:
29+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
30+
with:
31+
upload_url: ${{ steps.get_release.outputs.upload_url }}
32+
asset_path: geth-linux-amd64
33+
asset_name: geth-linux-amd64
34+
asset_content_type: application/octet-stream

Dockerfile.amd64

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Build Geth in a stock Go builder container
2+
FROM golang:1.15.5 as builder
3+
4+
#RUN apk add --no-cache make gcc musl-dev linux-headers git
5+
6+
ADD . /go-ethereum
7+
RUN cd /go-ethereum && make geth

cmd/geth/config.go

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ import (
2525
"unicode"
2626

2727
"gopkg.in/urfave/cli.v1"
28+
"github.com/ethereum/go-ethereum/eth/downloader"
29+
"github.com/ethereum/go-ethereum/statediff"
2830

2931
"github.com/ethereum/go-ethereum/cmd/utils"
3032
"github.com/ethereum/go-ethereum/eth"
@@ -145,6 +147,9 @@ func makeConfigNode(ctx *cli.Context) (*node.Node, gethConfig) {
145147
cfg.Ethstats.URL = ctx.GlobalString(utils.EthStatsURLFlag.Name)
146148
}
147149
utils.SetShhConfig(ctx, stack)
150+
if ctx.GlobalBool(utils.StateDiffFlag.Name) {
151+
cfg.Eth.Diffing = true
152+
}
148153

149154
return stack, cfg
150155
}
@@ -162,18 +167,67 @@ func checkWhisper(ctx *cli.Context) {
162167
func makeFullNode(ctx *cli.Context) (*node.Node, ethapi.Backend) {
163168
stack, cfg := makeConfigNode(ctx)
164169

170+
if cfg.Eth.SyncMode == downloader.LightSync {
171+
return makeLightNode(ctx, stack, cfg)
172+
}
173+
165174
backend := utils.RegisterEthService(stack, &cfg.Eth)
166175

167176
checkWhisper(ctx)
177+
178+
if ctx.GlobalBool(utils.StateDiffFlag.Name) {
179+
var dbParams *statediff.DBParams
180+
if ctx.GlobalIsSet(utils.StateDiffDBFlag.Name) {
181+
dbParams = new(statediff.DBParams)
182+
dbParams.ConnectionURL = ctx.GlobalString(utils.StateDiffDBFlag.Name)
183+
if ctx.GlobalIsSet(utils.StateDiffDBNodeIDFlag.Name) {
184+
dbParams.ID = ctx.GlobalString(utils.StateDiffDBNodeIDFlag.Name)
185+
} else {
186+
utils.Fatalf("Must specify node ID for statediff DB output")
187+
}
188+
if ctx.GlobalIsSet(utils.StateDiffDBClientNameFlag.Name) {
189+
dbParams.ClientName = ctx.GlobalString(utils.StateDiffDBClientNameFlag.Name)
190+
} else {
191+
utils.Fatalf("Must specify client name for statediff DB output")
192+
}
193+
} else {
194+
if ctx.GlobalBool(utils.StateDiffWritingFlag.Name) {
195+
utils.Fatalf("Must pass DB parameters if enabling statediff write loop")
196+
}
197+
}
198+
params := statediff.ServiceParams{
199+
DBParams: dbParams,
200+
EnableWriteLoop: ctx.GlobalBool(utils.StateDiffWritingFlag.Name),
201+
NumWorkers: ctx.GlobalUint(utils.StateDiffWorkersFlag.Name),
202+
}
203+
utils.RegisterStateDiffService(stack, backend, params)
204+
}
205+
206+
// Configure GraphQL if requested
207+
if ctx.GlobalIsSet(utils.GraphQLEnabledFlag.Name) {
208+
utils.RegisterGraphQLService(stack, backend.APIBackend, cfg.Node)
209+
}
210+
// Add the Ethereum Stats daemon if requested.
211+
if cfg.Ethstats.URL != "" {
212+
utils.RegisterEthStatsService(stack, backend.APIBackend, cfg.Ethstats.URL)
213+
}
214+
return stack, backend.APIBackend
215+
}
216+
217+
func makeLightNode(ctx *cli.Context, stack *node.Node, cfg gethConfig) (*node.Node, ethapi.Backend) {
218+
backend := utils.RegisterLesEthService(stack, &cfg.Eth)
219+
220+
checkWhisper(ctx)
221+
168222
// Configure GraphQL if requested
169223
if ctx.GlobalIsSet(utils.GraphQLEnabledFlag.Name) {
170-
utils.RegisterGraphQLService(stack, backend, cfg.Node)
224+
utils.RegisterGraphQLService(stack, backend.ApiBackend, cfg.Node)
171225
}
172226
// Add the Ethereum Stats daemon if requested.
173227
if cfg.Ethstats.URL != "" {
174-
utils.RegisterEthStatsService(stack, backend, cfg.Ethstats.URL)
228+
utils.RegisterEthStatsService(stack, backend.ApiBackend, cfg.Ethstats.URL)
175229
}
176-
return stack, backend
230+
return stack, backend.ApiBackend
177231
}
178232

179233
// dumpConfig is the dumpconfig command.

cmd/geth/main.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,12 @@ var (
158158
utils.GpoMaxGasPriceFlag,
159159
utils.EWASMInterpreterFlag,
160160
utils.EVMInterpreterFlag,
161+
utils.StateDiffFlag,
162+
utils.StateDiffDBFlag,
163+
utils.StateDiffDBNodeIDFlag,
164+
utils.StateDiffDBClientNameFlag,
165+
utils.StateDiffWritingFlag,
166+
utils.StateDiffWorkersFlag,
161167
configFileFlag,
162168
}
163169

cmd/geth/usage.go

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,17 @@ var AppHelpFlagGroups = []flags.FlagGroup{
235235
utils.LegacyGraphQLPortFlag,
236236
}, debug.DeprecatedFlags...),
237237
},
238+
{
239+
Name: "STATE DIFF",
240+
Flags: []cli.Flag{
241+
utils.StateDiffFlag,
242+
utils.StateDiffDBFlag,
243+
utils.StateDiffDBNodeIDFlag,
244+
utils.StateDiffDBClientNameFlag,
245+
utils.StateDiffWritingFlag,
246+
utils.StateDiffWorkersFlag,
247+
},
248+
},
238249
{
239250
Name: "MISC",
240251
Flags: []cli.Flag{

cmd/utils/flags.go

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ import (
6363
"github.com/ethereum/go-ethereum/p2p/nat"
6464
"github.com/ethereum/go-ethereum/p2p/netutil"
6565
"github.com/ethereum/go-ethereum/params"
66+
"github.com/ethereum/go-ethereum/statediff"
67+
6668
pcsclite "github.com/gballet/go-libpcsclite"
6769
"gopkg.in/urfave/cli.v1"
6870
)
@@ -726,6 +728,31 @@ var (
726728
Usage: "External EVM configuration (default = built-in interpreter)",
727729
Value: "",
728730
}
731+
732+
StateDiffFlag = cli.BoolFlag{
733+
Name: "statediff",
734+
Usage: "Enables the processing of state diffs between each block",
735+
}
736+
StateDiffDBFlag = cli.StringFlag{
737+
Name: "statediff.db",
738+
Usage: "PostgreSQL database connection string for writing state diffs",
739+
}
740+
StateDiffDBNodeIDFlag = cli.StringFlag{
741+
Name: "statediff.dbnodeid",
742+
Usage: "Node ID to use when writing state diffs to database",
743+
}
744+
StateDiffDBClientNameFlag = cli.StringFlag{
745+
Name: "statediff.dbclientname",
746+
Usage: "Client name to use when writing state diffs to database",
747+
}
748+
StateDiffWritingFlag = cli.BoolFlag{
749+
Name: "statediff.writing",
750+
Usage: "Activates progressive writing of state diffs to database as new block are synced",
751+
}
752+
StateDiffWorkersFlag = cli.UintFlag{
753+
Name: "statediff.workers",
754+
Usage: "Number of concurrent workers to use during statediff processing (0 = 1)",
755+
}
729756
)
730757

731758
// MakeDataDir retrieves the currently requested data directory, terminating
@@ -991,6 +1018,9 @@ func setWS(ctx *cli.Context, cfg *node.Config) {
9911018
if ctx.GlobalIsSet(WSApiFlag.Name) {
9921019
cfg.WSModules = SplitAndTrim(ctx.GlobalString(WSApiFlag.Name))
9931020
}
1021+
if ctx.GlobalBool(StateDiffFlag.Name) {
1022+
cfg.WSModules = append(cfg.WSModules, "statediff")
1023+
}
9941024
}
9951025

9961026
// setIPC creates an IPC path configuration from the set command line flags,
@@ -1690,14 +1720,8 @@ func SetDNSDiscoveryDefaults(cfg *eth.Config, genesis common.Hash) {
16901720
}
16911721

16921722
// RegisterEthService adds an Ethereum client to the stack.
1693-
func RegisterEthService(stack *node.Node, cfg *eth.Config) ethapi.Backend {
1694-
if cfg.SyncMode == downloader.LightSync {
1695-
backend, err := les.New(stack, cfg)
1696-
if err != nil {
1697-
Fatalf("Failed to register the Ethereum service: %v", err)
1698-
}
1699-
return backend.ApiBackend
1700-
}
1723+
// RegisterEthService adds an Ethereum client to the stack.
1724+
func RegisterEthService(stack *node.Node, cfg *eth.Config) *eth.Ethereum {
17011725
backend, err := eth.New(stack, cfg)
17021726
if err != nil {
17031727
Fatalf("Failed to register the Ethereum service: %v", err)
@@ -1708,7 +1732,16 @@ func RegisterEthService(stack *node.Node, cfg *eth.Config) ethapi.Backend {
17081732
Fatalf("Failed to create the LES server: %v", err)
17091733
}
17101734
}
1711-
return backend.APIBackend
1735+
return backend
1736+
}
1737+
1738+
// RegisterLesEthService adds an Ethereum les client to the stack.
1739+
func RegisterLesEthService(stack *node.Node, cfg *eth.Config) *les.LightEthereum {
1740+
backend, err := les.New(stack, cfg)
1741+
if err != nil {
1742+
Fatalf("Failed to register the Ethereum service: %v", err)
1743+
}
1744+
return backend
17121745
}
17131746

17141747
// RegisterEthStatsService configures the Ethereum Stats daemon and adds it to
@@ -1726,6 +1759,13 @@ func RegisterGraphQLService(stack *node.Node, backend ethapi.Backend, cfg node.C
17261759
}
17271760
}
17281761

1762+
// RegisterStateDiffService configures and registers a service to stream state diff data over RPC
1763+
func RegisterStateDiffService(stack *node.Node, ethServ *eth.Ethereum, params statediff.ServiceParams) {
1764+
if err := statediff.New(stack, ethServ, params); err != nil {
1765+
Fatalf("Failed to register the Statediff service: %v", err)
1766+
}
1767+
}
1768+
17291769
func SetupMetrics(ctx *cli.Context) {
17301770
if metrics.Enabled {
17311771
log.Info("Enabling metrics collection")

0 commit comments

Comments
 (0)