Skip to content

Commit 81eb75d

Browse files
authored
Merge pull request #333 from input-output-hk/cet-snap-parser
Snapshot parser with TODO's for handling specific state modules. Thanks for the review and approval, @lowhung
2 parents 12b75f1 + 0b170bb commit 81eb75d

File tree

3 files changed

+339
-14
lines changed

3 files changed

+339
-14
lines changed

common/src/messages.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -311,6 +311,7 @@ pub enum CardanoMessage {
311311

312312
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
313313
pub enum SnapshotMessage {
314+
Startup, // subscribers should listen for incremental snapshot data
314315
Bootstrap(SnapshotStateMessage),
315316
DumpRequest(SnapshotDumpMessage),
316317
Dump(SnapshotStateMessage),

common/src/snapshot/NOTES.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Bootstrapping from a Snapshot file
2+
We can boot an Acropolis node either from geneis and replay all of the blocks up to
3+
some point, or we can boot from a snapshot file. This module provides the components
4+
needed to boot from a snapshot file. See [snapshot_bootsrapper](../../../modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs) for the process that references and runs with these helpers.
5+
6+
Booting from a snapshot takes minutes instead of the hours it takes to boot from
7+
genesis. It also allows booting from a given epoch which allows one to create tests
8+
that rely only on that epoch of data. We're also skipping some of the problematic
9+
eras and will typically boot from Conway around epoch 305, 306, and 307. It takes
10+
three epochs to have enough context to correctly calculate the rewards.
11+
12+
The required data for boostrapping are:
13+
- snapshot files (each has an associated epoch number and point)
14+
- nonces
15+
- headers
16+
17+
## Snapshot Files
18+
The snapshots come from the Amaru project. In their words,
19+
"the snapshots we generated are different [from a Mithril snapshot]: they're
20+
the actual ledger state; i.e. the in-memory state that is constructed by iterating over each block up to a specific point. So, it's all the UTxOs, the set of pending governance actions, the account balance, etc.
21+
If you get this from a trusted source, you don't need to do any replay, you can just start up and load this from disk.
22+
The format of these is completely non-standard; we just forked the haskell node and spit out whatever we needed to in CBOR."
23+
24+
Snapshot files are referenced by their epoch number in the config.json file below.
25+
26+
See [Amaru snapshot format](../../../docs/amaru-snapshot-structure.md)
27+
28+
## Configuration files
29+
There is a path for each network bootstrap configuration file. Network Should
30+
be one of 'mainnet', 'preprod', 'preview' or 'testnet_<magic>' where
31+
`magic` is a 32-bits unsigned value denoting a particular testnet.
32+
33+
Data structure, e.g. as [Amaru mainnet](https://github.com/pragma-org/amaru/tree/main/data/mainnet)
34+
35+
The bootstrapper will be given a path to a directory that is expected to contain
36+
the following files: snapshots.json, nonces.json, and headers.json. The path will
37+
be used as a prefix to resolve per-network configuration files
38+
needed for bootstrapping. Given a source directory `data`, and a
39+
a network name of `preview`, the expected layout for configuration files would be:
40+
41+
* `data/preview/config.json`: a list of epochs to load.
42+
* `data/preview/snapshots.json`: a list of `Snapshot` values (epoch, point, url)
43+
* `data/preview/nonces.json`: a list of `InitialNonces` values,
44+
* `data/preview/headers.json`: a list of `Point`s.
45+
46+
These files are loaded by [snapshot_bootsrapper](../../../modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs) during bootup.
47+
48+
## Bootstrapping sequence
49+
50+
The bootstrapper will be started with an argument that specifies a network,
51+
e.g. "mainnet". From the network, it will build a path to the configuration
52+
and snapshot files as shown above, then load the data contained or described
53+
in those files. config.json holds a list of typically 3 epochs that can be
54+
used to index into snapshots.json to find the corresponding URLs and meta-data
55+
for each of the three snapshot files. Loading occurs in this order:
56+
57+
* publish `SnapshotMessage::Startup`
58+
* download the snapshots (on demand; may have already been done externally)
59+
* parse each snapshot and publish their data on the message bus
60+
* read nonces and publish
61+
* read headers and publish
62+
* publish `CardanoMessage::GenesisComplete(GenesisCompleteMessage {...})`
63+
64+
Modules in the system will have subscribed to the Startup message and also
65+
to individual structural data update messages before the
66+
boostrapper runs the above sequence. Upon receiving the `Startup` message,
67+
they will use data messages to populate their state, history (for BlockFrost),
68+
and any other state required to achieve readiness to operate on reception of
69+
the `GenesisCompleteMessage`.
70+
71+
## Data update messages
72+
73+
The bootstrapper will publish data as it parses the snapshot files, nonces, and
74+
headers. Snapshot parsing is done while streaming the data to keep the memory
75+
footprint lower. As elements of the file are parsed, callbacks provide the data
76+
to the boostrapper which publishes the data on the message bus.
77+
78+
There are TODO markers in [snapshot_bootsrapper](../../../modules/snapshot_bootstrapper/src/snapshot_bootstrapper.rs) that show where to add the
79+
publishing of the parsed snapshot data.
80+
81+
82+

0 commit comments

Comments
 (0)