-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Commit 56b6b0d
authored
Use a new schema for the data storage in Linera. (#4814)
## Motivation
Databases commonly use a `partition_key` which corresponds to `root_key`
in our code.
The partition key is used for hashing purposes in order to spread the
workload over nodes.
An unfortunate feature of the existing schema is that Blobs, BlobStates,
Events, and certificates
are all in the same partition (the one corresponding to `&[]`). This
causes some performance
problems. A common recommendation for schema design is that the
partition key should be
spread out so that one bin does not receive too much data.
Fixes #4807
## Proposal
The following proposal is implemented:
* For all base keys except Event, the root key is determined from the
serialization.
* For the Events, we want to access several events at once. So, the base
key is serialized by taking only the `ChainId`, `StreamId`. This led to
the introduction of a `fn root_key(&self)` for the `BaseKey` type. The
function does not return errors since for the types in question, BlobId,
CryptoHash, ChainId, returning an error is impossible.
Why this change is the right one:
* There is no limit to the number of partition keys in databases. On the
other hand, there is a limit to the size of data for a specified
partition key. So concentrating all data on one partition key creates
potential problems above 100M and may fail completely for 2G.
* We are already having a root-key formed from the `ChainId` for the
application states. So, we already accept that we can have many many
partition keys.
The `Batch` of `linera-storage` is replaced by a `MultiPartitionBatch`.
It is unfortunate that we had the collision with the `Batch` of
`linera-views`.
This PR does the requested job of changing only the `linera-storage`.
However, there are some losses of parallelization for the
`read_multi_values/contains_keys` operation. This is not irremediable:
* We can add some functions `read_multi_root_values(_, root_keys:
Vec<Vec<u8>>, key: Vec<u8>)` to the `KeyValueDatabase`. It is possible
to implement this feature efficiently in `ScyllaDb`, which is our main
database target.
* We can add some `write_multi_partition_batch` to the
`KeyValueDatabase`. Note that the existing `write_batch` in
`db_storage.rs` is creating many futures, but the right solution is
likely to group the entries. Of course, batch size is an issue, but it
has to be addressed by measuring it not spreading over all.
* It is a little bit problematic to see how those features could be
implemented in the combinators like LruCaching, ValueSplitting, and so
on.
## Test Plan
The CI.
## Release Plan
Hopefully, to put it into the main.
It is possible to write a migration tool that takes the existing storage
of TestNet Conway and converts it to the new schema. But that is only if
we really want to do that.
Before that, it would be good to see if the scalability works as
expected for ScyllaDb runs.
## Links
None.1 parent df074a1 commit 56b6b0dCopy full SHA for 56b6b0d
File tree
Expand file treeCollapse file tree
1 file changed
+187
-179
lines changedOpen diff view settings
Filter options
- linera-storage/src
Expand file treeCollapse file tree
1 file changed
+187
-179
lines changedOpen diff view settings
0 commit comments