Skip to content

Commit ef7a6c2

Browse files
committed
Add integration notes
1 parent 3ba7727 commit ef7a6c2

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Storing the Cardano ledger state on disk: integration notes for high-performance backend
2+
3+
Authors: Joris Dral, Wolfgang Jeltsch
4+
Date: May 2025
5+
6+
## Sessions
7+
8+
Creating new empty tables or opening tables from snapshots requires a `Session`.
9+
The session can be created using `openSession`, which has to be done in the
10+
consensus layer. The session should be shared between all tables. Sharing
11+
between a table and its duplicates, which are created using `duplicate`, is
12+
automatic. Once the session is created, it could be stored in the `LedgerDB`.
13+
When the `LedgerDB` is closed, all tables and the session should be closed.
14+
Closing the session will automatically close all tables, but this is only
15+
intended to be a backup functionality: ideally the user closes all tables
16+
manually.
17+
18+
## The compact index
19+
20+
The compact index is a memory-efficient data structure that maintains serialised
21+
keys. Rather than storing full keys, it only stores the first 64 bits of each
22+
key.
23+
24+
The compact index only works properly if in most cases it can determine the
25+
order of two serialised keys by looking at their 64-bit prefixes. This is the
26+
case, for example, when the keys are hashes: the probability that two hashes
27+
have the same 64-bit prefixes is $\frac{1}{2}^{64}$ and thus very small. If the
28+
hashes are 256 bits in size, then the compact index uses 4 times less memory
29+
than if it would store the full keys.
30+
31+
There is a backup mechanism in place for the case when the 64-bit prefixes of
32+
keys are not sufficient to make a comparison. This backup mechanism is less
33+
memory-efficient and less performant. That said, if the probability of prefix
34+
clashes is very small, like in the example above, then in practice the backup
35+
mechanism will never be used.
36+
37+
UTXO keys are *almost* uniformly distributed. Each UTXO key consist of a 32-byte
38+
hash and a 2-byte index. While the distribution of hashes is uniform, the
39+
distribution of indexes is not, as indexes are counters that always start at 0.
40+
A typical transaction has two inputs and two outputs and thus requires storing
41+
two UTXO keys that have the same hash part, albeit not the same index part. If
42+
we serialise UTXO keys naively, putting the hash part before the index part,
43+
then the 64-bit prefixes will often not be sufficient to make comparisons
44+
between keys. As a result, the backup mechanism will kick in way too often,
45+
which will severely hamper performance.
46+
47+
The solution is to change the serialisation of UTXO keys such that the first
48+
64 bits of a serialised key comprise the 2-byte index and just 48 bits of the
49+
hash. This way, comparisons of keys with equal hashes will succeed, as the
50+
indexes will be taken into account. On the other hand, it becomes more likely
51+
that the covered bits of hashes are not enough to distinguish between different
52+
hashes, but the propability of this should still be so low that the backup
53+
mechanism will not kick in in practice.
54+
55+
Importantly, range lookups and cursor reads return key–value pairs in the order
56+
of their *serialised* keys. With the described change to UTXO key serialisation,
57+
the ordering of serialised keys no longer matches the ordering of actual,
58+
unserialised keys. This is fine for `lsm-tree`, for which any total ordering of
59+
keys is as good as any other total ordering. However, the consensus layer will
60+
face the situation where a range lookup or a cursor read returns key–value pairs
61+
slightly out of order. Currently, we do not expect this to cause problems.
62+
63+
## Snapshots
64+
65+
Snapshots currently require support for hard links. This means that on Windows
66+
the library only works when using NTFS. Support for other file systems could be
67+
added by providing an alternative snapshotting method, but such a method would
68+
likely involve copying file contents, which is slower than hard-linking.
69+
70+
Creating a snapshot outside the session directory while still using hard links
71+
should be possible as long as the directory for the snapshot is on the same disk
72+
volume as the session directory, but this feature is currently not implemented.
73+
Hard-linking across different volumes is generally not possible; therefore,
74+
placing a snapshot on a volume that does not hold the associated session
75+
directory requires a different snapshotting implementation, which would probably
76+
also rely on copying file contents.
77+
78+
A copying snapshotting implementation would probably kill two birds with one
79+
stone by removing the two current limitations just discussed.
80+
81+
## Value resolving
82+
83+
When instantiating the `ResolveValue` class, it is usually advisable to
84+
implement `resolveValue` such that it works directly on the serialised values.
85+
This is typically cheaper than having `resolveValue` deserialise the values,
86+
composing them, and then serialising the result. For example, when the resolve
87+
function is intended to work like `(+)`, then `resolveValue` could add the raw
88+
bytes of the serialised values and would likely achieve better performance this
89+
way.
90+
91+
## `io-classes` incompatibility
92+
93+
At the time of writing, various packages in the `cardano-node` stack depend on
94+
`io-classes-1.5` and the 1.5-versions of its daughter packages, like
95+
`strict-stm`. For example, the build dependencies in `ouroboros-consensus.cabal`
96+
contain the following:
97+
98+
* `io-classes ^>= 1.5`
99+
* `strict-stm ^>= 1.5`
100+
101+
However, `lsm-tree` needs `io-classes-1.6` or `io-classes-1.7`, and this leads
102+
to a dependency conflict. One would hope that a package could have loose enough
103+
bounds that it could be built with `io-classes-1.5`, `io-classes-1.6`, and
104+
`io-classes-1.7`. Unfortunately, this is not the case, because, starting with
105+
the `io-classes-1.6` release, daughter packages like `strict-stm` are
106+
sublibraries of `io-classes`. For example, the build dependencies in
107+
`lsm-tree.cabal` contain the following:
108+
109+
* `io-classes ^>= 1.6 || ^>= 1.7`
110+
* `io-classes:strict-stm`
111+
112+
Sadly, there is currently no way to express both sets of build dependencies
113+
within a single `build-depends` field, as Cabal’s support for conditional
114+
expressions is not powerful enough for this.
115+
116+
It is known to us that the `ouroboros-consensus` stack has not been updated to
117+
`io-classes-1.7` due to a bug related to Nix. For more information, see
118+
https://github.com/IntersectMBO/ouroboros-network/pull/4951. We would advise to
119+
fix this Nix-related bug rather than downgrading `lsm-tree`’s dependency on
120+
`io-classes` to version 1.5.

0 commit comments

Comments
 (0)