-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Introduce a new column type: MultiTree intended for more efficient storage of blockchain state tries.
Design goals:
Allow storing arbitrary trie structures in the database. Each tree is identified by a root key. There may be arbitrary number of roots. The column should optimize disk layout for tree traversal.
The API.
type NodeAddress = u64;
type Children = Vec<NodeAddress>;NodeAddress is an opaque ID that allows for quick node retrieval. These IDs are defined by the database and will most likely be just Table::Address
Operation enum is extended with two additional tree operations.
pub enum Operation<Key, Value> {
InsertTree { root: Key, root: NewNode },
RemoveTree { root: Key },
//...
}
pub enum NodeRef {
New(NewNode),
Existing(NodeAddress),
}
struct NewNode {
data: Vec<u8>,
children: Vec<NodeRef>, // May reference the node in the same commit or an existing node.
}Querying a root:
fn get_root(col: ColId, key: Key) -> Result<Option<(Vec<u8>, Children)>>;Querying non-root nodes:
fn get_node(col: ColId, node_address: NodeAddress) -> Result<Option<(Vec<u8>, Children)>>;Internal structure.
Hash index will be used for root nodes only. Nodes are stored in value tables. For Each node we store
- A list of up to
Npairs of(NodeKey, NodeAddress).Nis a compile time constant = 16. - User data.
Removing trees.
There are two possible approches to implementing tree removal
- Journaling. For each root the database records all added value
NodeAddresses. On deletion these are dereferenced and the journal record is deleted as well. These records may be stored in a separate value table.
This should be faster overall, but potentially waste space for large tree sets. - Reference counting. For each node we store a counter of all the parent nodes that reference it directly. When a node is deleted, we read all direct children and dereference them. The downside here is that inserting or removing a node would require updating up to 16 children nodes, potentially blowing up the IO.
Background optimization.
The database will periodically resettle nodes within value tables to optimize on-disk tree layout. The idea here that iterating a recent trie root should result in sequential IO.
Potential issues:
Data duplicaton. The design as proposed here has no node index. If the same value node is inserted into 2 different paths of the same tree, it will be stored twice. If this turns out to be an issue we'd need to add each node to the index and use reference counting.
We need to collect statistics for existing chains to estimate the impact for this.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status