Skip to content

Commit 9291451

Browse files
authored
Merge pull request #753 from IntersectMBO/jdral/integration-notes
Add integration notes
2 parents 3ba7727 + 7d3bde9 commit 9291451

File tree

1 file changed

+126
-0
lines changed

1 file changed

+126
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Storing the Cardano ledger state on disk: integration notes for high-performance backend
2+
3+
Authors: Joris Dral, Wolfgang Jeltsch
4+
Date: May 2025
5+
6+
## Sessions
7+
8+
Creating new empty tables or opening tables from snapshots requires a `Session`.
9+
The session can be created using `openSession`, which has to be done in the
10+
consensus layer. The session should be shared between all tables. Sharing
11+
between a table and its duplicates, which are created using `duplicate`, is
12+
automatic. Once the session is created, it could be stored in the `LedgerDB`.
13+
When the `LedgerDB` is closed, all tables and the session should be closed.
14+
Closing the session will automatically close all tables, but this is only
15+
intended to be a backup functionality: ideally the user closes all tables
16+
manually.
17+
18+
## The compact index
19+
20+
The compact index is a memory-efficient data structure that maintains serialised
21+
keys. Rather than storing full keys, it only stores the first 64 bits of each
22+
key.
23+
24+
The compact index only works properly if in most cases it can determine the
25+
order of two serialised keys by looking at their 64-bit prefixes. This is the
26+
case, for example, when the keys are hashes: the probability that two hashes
27+
have the same 64-bit prefixes is $\frac{1}{2}^{64}$ and thus very small. If the
28+
hashes are 256 bits in size, then the compact index uses 4 times less memory
29+
than if it would store the full keys.
30+
31+
There is a backup mechanism in place for the case when the 64-bit prefixes of
32+
keys are not sufficient to make a comparison. This backup mechanism is less
33+
memory-efficient and less performant. That said, if the probability of prefix
34+
clashes is very small, like in the example above, then in practice the backup
35+
mechanism will never be used.
36+
37+
UTXO keys are *almost* uniformly distributed. Each UTXO key consist of a 32-byte
38+
hash and a 2-byte index. While the distribution of hashes is uniform, the
39+
distribution of indexes is not, as indexes are counters that always start at 0.
40+
A typical transaction has two inputs and two outputs and thus requires storing
41+
two UTXO keys that have the same hash part, albeit not the same index part. If
42+
we serialise UTXO keys naively, putting the hash part before the index part,
43+
then the 64-bit prefixes will often not be sufficient to make comparisons
44+
between keys. As a result, the backup mechanism will kick in way too often,
45+
which will severely hamper performance.
46+
47+
The solution is to change the serialisation of UTXO keys such that the first
48+
64 bits of a serialised key comprise the 2-byte index and just 48 bits of the
49+
hash. This way, comparisons of keys with equal hashes will succeed, as the
50+
indexes will be taken into account. On the other hand, it becomes more likely
51+
that the covered bits of hashes are not enough to distinguish between different
52+
hashes, but the propability of this should still be so low that the backup
53+
mechanism will not kick in in practice.
54+
55+
Importantly, range lookups and cursor reads return key–value pairs in the order
56+
of their *serialised* keys. With the described change to UTXO key serialisation,
57+
the ordering of serialised keys no longer matches the ordering of actual,
58+
unserialised keys. This is fine for `lsm-tree`, for which any total ordering of
59+
keys is as good as any other total ordering. However, the consensus layer will
60+
face the situation where a range lookup or a cursor read returns key–value pairs
61+
slightly out of order. Currently, we do not expect this to cause problems.
62+
63+
## Snapshots
64+
65+
Snapshots currently require support for hard links. This means that on Windows
66+
the library only works when using NTFS. Support for other file systems could be
67+
added by providing an alternative snapshotting method, but such a method would
68+
likely involve copying file contents, which is slower than hard-linking.
69+
70+
Creating a snapshot outside the session directory while still using hard links
71+
should be possible as long as the directory for the snapshot is on the same disk
72+
volume as the session directory, but this feature is currently not implemented.
73+
Hard-linking across different volumes is generally not possible; therefore,
74+
placing a snapshot on a volume that does not hold the associated session
75+
directory requires a different snapshotting implementation, which would probably
76+
also rely on copying file contents.
77+
78+
A copying snapshotting implementation would probably kill two birds with one
79+
stone by removing the two current limitations just discussed.
80+
81+
Presumably, `cardano-node` will eventually be required to support storing
82+
snapshots on a different volume than where the session is placed, for example on
83+
a cheaper non-SSD drive. This feature was unfortunately not anticipated in the
84+
project specification and so is not currently included. As discussed above, it
85+
could be added with some additional work.
86+
87+
## Value resolving
88+
89+
When instantiating the `ResolveValue` class, it is usually advisable to
90+
implement `resolveValue` such that it works directly on the serialised values.
91+
This is typically cheaper than having `resolveValue` deserialise the values,
92+
composing them, and then serialising the result. For example, when the resolve
93+
function is intended to work like `(+)`, then `resolveValue` could add the raw
94+
bytes of the serialised values and would likely achieve better performance this
95+
way.
96+
97+
## `io-classes` incompatibility
98+
99+
At the time of writing, various packages in the `cardano-node` stack depend on
100+
`io-classes-1.5` and the 1.5-versions of its daughter packages, like
101+
`strict-stm`. For example, the build dependencies in `ouroboros-consensus.cabal`
102+
contain the following:
103+
104+
* `io-classes ^>= 1.5`
105+
* `strict-stm ^>= 1.5`
106+
107+
However, `lsm-tree` needs `io-classes-1.6` or `io-classes-1.7`, and this leads
108+
to a dependency conflict. One would hope that a package could have loose enough
109+
bounds that it could be built with `io-classes-1.5`, `io-classes-1.6`, and
110+
`io-classes-1.7`. Unfortunately, this is not the case, because, starting with
111+
the `io-classes-1.6` release, daughter packages like `strict-stm` are
112+
sublibraries of `io-classes`. For example, the build dependencies in
113+
`lsm-tree.cabal` contain the following:
114+
115+
* `io-classes ^>= 1.6 || ^>= 1.7`
116+
* `io-classes:strict-stm`
117+
118+
Sadly, there is currently no way to express both sets of build dependencies
119+
within a single `build-depends` field, as Cabal’s support for conditional
120+
expressions is not powerful enough for this.
121+
122+
It is known to us that the `ouroboros-consensus` stack has not been updated to
123+
`io-classes-1.7` due to a bug related to Nix. For more information, see
124+
https://github.com/IntersectMBO/ouroboros-network/pull/4951. We would advise to
125+
fix this Nix-related bug rather than downgrading `lsm-tree`’s dependency on
126+
`io-classes` to version 1.5.

0 commit comments

Comments
 (0)