Skip to content

MerkleTree-specific Type Aliases#251

Merged
aszepieniec merged 19 commits intomasterfrom
asz/merkle-tree-index-types
May 19, 2025
Merged

MerkleTree-specific Type Aliases#251
aszepieniec merged 19 commits intomasterfrom
asz/merkle-tree-index-types

Conversation

@aszepieniec
Copy link
Copy Markdown
Collaborator

This PR introduces type aliases:

  • MerkleTreeNodeIndex for u64
  • MerkleTreeLeafIndex for u64
  • MerkleTreeHeight for u32.

It also refactors MerkleTree to avoid using usize where-ever
possible, particularly in public functions. Instead, these types
aliases are used.

MerkleTree internally stores a Vec<Digest>, which induces two
limitations:

  1. The index type is usize, which complicates matters when
    targeting compilation on 32-bit machines (including 32-bit WASM).
  2. This vector can store at most 2^25 nodes, whereas we would like
    functions associated with Merkle trees to work for Merkle trees
    that are far larger (without needing to store all the internal
    nodes explicitly).

The second limitation persists. However, the first limitation is fixed
by this PR.

Also:

  • Public functions nodes() and leafs() now return iterators
    instead of slices. As these functions were public, this change is
    breaking.
  • Private function authentication_structure_node_indices is
    marked pub as it is needed downstream.

Displaces #250.

This commit introduces type aliases:
 - `MerkleTreeNodeIndex` for `u64`
 - `MerkleTreeLeafIndex` for `u64`
 - `MerkleTreeHeight` for `u32`.

It also refactors `MerkleTree` to avoid using `usize` where-ever
possible, particularly in public functions. Instead, these types
aliases are used.

`MerkleTree` internally stores a `Vec<Digest>`, which induces two
limitations:
 1. The index type is `usize`, which complicates matters when
    targeting compilation on 32-bit machines (including 32-bit WASM).
 2. This vector can store at most 2^25 nodes, whereas we would like
    functions associated with Merkle trees to work for Merkle trees
    that are far larger (without needing to store all the internal
    nodes explicitly).

The second limitation persists. However, the first limitation is fixed
by this commit.

Also, public functions `nodes()` and `leafs()` now return iterators
instead of slices. As these functions were public, this change is
breaking.

Addresses #250.
Needed downstream, for compressing `RemovalRecord` lists.
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 13, 2025

Coverage Status

coverage: 97.908% (+0.06%) from 97.852%
when pulling 6d3eb9b on asz/merkle-tree-index-types
into ff0ec3a on master.

@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from 7a13836 to 185df8e Compare April 13, 2025 17:13
@jan-ferdinand
Copy link
Copy Markdown
Member

I'm not sure type aliases are the way to go; as aliases, they are not distinct types. If we really want to go that way, I suggest to use the newtype pattern instead.

pub struct MerkleTreeNodeIndex(u64);

impl MerkleTreeNodeIndex {
    const MAX: u64 = 1 << 63;
}

// similar for these two:
pub struct MerkleTreeLeafIndex(u64);
pub struct MerkleTreeHeight(u8);

// maybe implement std::ops::Deref

I understand the primary goal is to enable compilation on 32-bit targets. I think this can be achieved in a simpler way:

const MAX_NUM_NODES: usize = {
    #[cfg(target_pointer_width = "16")]
    { compile_error!("target pointer width 16 is not supported") }
    #[cfg(target_pointer_width = "32")]
    { 1 << 25 } // comment as to why
    #[cfg(target_pointer_width = "64")]
    { 1 << 32 }
};

However, I'm unsure we even want to have this constant. Its documentation states that it “[e]nforces that all compilation targets have a consistent MAX_TREE_HEIGHT” – but why is that even important?1 Unfortunately, the comment doesn't say. If we really want to have a consistent maximum tree height across compilation targets, and if that maximum tree height should be larger than 24, we need a different approach to our backing storage instead.

Also note that the second limitation you outline, namely that a vector can store at most $2^{25}$ nodes, only applies on 32-bit targets. For 64-bit architectures, the limit is a tree height of 56, or $2^{57}$ nodes.

Footnotes

  1. It also enforced this in an unintended way, more akin to #[cfg(not(target_pointer_width = "64"))] compile_error!("whoups");.

@aszepieniec
Copy link
Copy Markdown
Collaborator Author

I'm not sure type aliases are the way to go; as aliases, they are not distinct types. If we really want to go that way, I suggest to use the newtype pattern instead.

I am familiar with the difference between type aliases and newtypes. I do not see the link between type aliases not being distinct types and type aliases being a poor strategy in this particular case or in general. I think you might sneaking in objectives to the discussion that I did not have going in.

Which is fair. The dialectic provoked by different objectives usually exposes superior architectures. But let's be wary of letting the perfect be the enemy of the good.

I realize I was not explicit about my objectives going in, so let me correct that. My goals were:

  • expose authentication_node_indices as it is needed downstream
  • enable compilation on 32-bit platforms
  • synchronize node index types across Merkle tree and MMR , particularly by refactoring Merkle tree's usize-index into MMR's u64-index (I forget the details but this was an irritation in a past episode where I had to use MMR function calls to do Merkle logic because I needed u64s)
  • clarify whenever a u64 is a {Merkle,MMR}-{node,leaf}-index to benefit readability within twenty-first and for dependencies
  • avoid unnecessary downstream refactoring due to dependency-upgrades

Non-goals were:

  • structurally change the code to make it more idiomatic rust
  • fix broken logic
  • make the logic more robust against potential errors

In light of that statement of objectives, I'm not sure why newtypes would be preferable over type aliases.

However, I'm unsure we even want to have this constant.

Good point! My first thought after reading it stated so succinctly is, "let's kill it". Let me sleep on it to see if other thoughts follow up.

Also note that the second limitation you outline, namely that a vector can store at most $2^{25}$ nodes, only applies on 32-bit targets. For 64-bit architectures, the limit is a tree height of 56, or $2^{57}$ nodes.

Thanks for the clarification. I'll change the code accordingly.

@jan-ferdinand
Copy link
Copy Markdown
Member

I can't see any downside from removing the various MAX_* constants, and adapting the code is easy enough. Perhaps @Sword-Smith remembers something we're overlooking?

Copy link
Copy Markdown
Member

@jan-ferdinand jan-ferdinand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing the various MAX_* constants is the way to go.

pub fn authentication_structure_node_indices(
num_leafs: MerkleTreeLeafIndex,
leaf_indices: &[MerkleTreeLeafIndex],
) -> Result<impl ExactSizeIterator<Item = MerkleTreeLeafIndex> + use<>> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason I'm not a fan of creating aliases for simple types: the function name and the documentation imply MerkleTreeNodeIndex for the iterator's item type, yet it is MerkleTreeLeafIndex.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This should absolutely be MerkleTreeNodeIndex.

I understand the drawback of not having the compiler's help to identify inconsistencies. It leads to wrong aliases slipping through the cracks, like here. But if that's a compelling argument against using type aliases, then it's also a compelling argument against comments in the code describing what's going on. Do you have a similar policy on comments (not including doctests or docstrings)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To a certain degree. If the function's signature (including name) fully explains what the function does, then I don't add the noise that a comment would now be. However, since the comment has to follow fewer rules than the name of the function, it's easy to add information not captured in the signature, which makes adding a comment worthwhile often enough.

All that said, I might misunderstand what you're getting at, since you are excluding docstrings from consideration. The type aliases are public, as should the description of a function's “what” be, meaning it should go in the docstring.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type aliases can have docstrings, and these particular docstrings can be used to document index conventions. This solution strikes me as better than the alternative, which is to document the relevant bits of the index conventions in the functions' docstrings. This alternative leads to a scattered convention documentation.

@jan-ferdinand
Copy link
Copy Markdown
Member

jan-ferdinand commented Apr 23, 2025

We should probably add a CI workflow (or modify an existing one) to check that compilation on 32-bit architectures works. The following is a starting point:

rustup target add i686-unknown-linux-gnu
cargo clippy --target i686-unknown-linux-gnu

I'm not familiar with all the pitfalls of cross compilation, but this seems like a starting point. Here's a list of all targets. Out of the Tier 1 targets, there are two 32-bit architectures: i686-pc-windows-msvc and i686-unknown-linux-gnu. There are additional 32-bit targets in Tier 2.

@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from bffd0c6 to 24e555b Compare April 24, 2025 14:57
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from 24e555b to 53b3ee2 Compare April 24, 2025 14:59
aszepieniec and others added 2 commits April 24, 2025 19:23
 - chore: Use correct type alias
 - style: Use standard strings for primitive type conversion error
 - docs: Explain impossible source of error

Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
Specifically: kill `nodes()` and `num_nodes()`.

If you want to get a specific node, use `merkle_tree.node(i)` instead
of `merkle_tree.nodes()[i]`.
aszepieniec and others added 2 commits May 5, 2025 15:11
Drop the `const`s `MAX_NUM_NODES`, `MAX_NUM_LEAFS`, and `MAX_TREE_HEIGHT`.
The consts supposedly guarantee cross-platform support. However:
 - It is unclear if any platform supports trees with 2^64 nodes, so the
   support guarantee is kind of vacuous.
 - Transferring Merkle trees between different machines is not a supported
   use case.

Cf. #251.

Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from a99850e to 63fa287 Compare May 5, 2025 13:11
Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from 459f305 to 5027e0a Compare May 5, 2025 13:14
Co-authored-by: Perplexity.ai
Commits in question give trouble when using `twenty-first` as
a dependency. Reason unclear.

1. Revert "ci: Fix 32-bit workflow"
   This reverts commit 4469810.
2. Revert "ci: Verify build on 32-bit targets"
   This reverts commit 5027e0a.
aszepieniec and others added 3 commits May 9, 2025 10:58
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
Copy link
Copy Markdown
Member

@jan-ferdinand jan-ferdinand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nits inline. Generally 👍

return Err(MerkleTreeError::TreeTooHigh);
}
Ok(1 << self.tree_height)
fn num_leafs(&self) -> Result<MerkleTreeLeafIndex> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to use a type that is (in name) explicitly ordinal as a cardinal. Maybe usize is actually more appropriate here? Not totally sure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Please elaborate.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, num_x implies the result be a cardinal number, but x_index sounds like an ordinal number. usize doesn't claim to be either, and so to me it feels like it can be used for both without triggering this feeling of dissonance.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "index" ordinal? O_o

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more a language thing than a math thing, but since this PR is primarily about names, that feels in scope. “In linguistics, ordinal numerals […] are words representing position or rank in a sequential order.” (source)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I learn. Thanks for the reference.

In my personal linguistic taste (which appears to be contrary to conventions well-established enough to grace wikipedia), an "index" is a string of data that enables you to locate some other data object according to some canonical lookup procedure, and so it is not necessarily either cardinal or ordinal in my book. That said, the dissonance persists: a number of leafs has little to do with instructions about how to find them, unless one assumes a contiguous storage in RAM. That assumption might be the reason why rust's usize gets to benefit from both ordinal and cardinal interpretations.

Making abstraction of that assumption is not an objective here. In other words, the knowledge that a MerkleTree contains nodes that are laid out contiguously in memory is implicit in every endpoint in the interface and that's okay. And so there is no need to communicate that abstraction through adequate naming and no potential for confusion (about that abstraction) through inadequate naming. So I do think it is linguistic dissonance only, and not a smell of something more profound.

Off the top of my head I can think of the following resolutions:

  • Use usize instead. Asymmetry eww. More importantly: if we decide that MerkleTreeLeafIndex should be an alias for u64 instead (or some other type) then we probably want to modify the return type of num_leafs in lock-step, whereas this option suggests it is okay to break that link.
  • Add symmetric type alias MerkleTreeLeafCount. Overkill.
  • Find a word or phrase that captures both a number of leafs and their indices and use that instead. I'm happy to entertain suggestions but I'm afraid any such phrase will increase confusion because unfamiliar.
  • Keep MerkleTreeLeafIndex and suffer the dissonance.

I'll leave this PR open for a day or two more in case you or anyone else wants to have a stab at the penultimate bullet point (or even convince me that one of the first two aren't so bad). Barring that, I'll proceed with the last and merge.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying types like HashMap or str for their length results in a usize, even though they (generally) cannot be indexed with that type. To me, this indicates that (1) the first option is consistent with prominent rust uses like the standard library and (2) changing the index type does not imply that the return type for num_*() functions would change.

In any case, options 1, 3, and 4 all seem (to varying degrees) fine to me.

pub enum MerkleTreeError {
#[error("All leaf indices must be valid, i.e., less than {num_leafs}.")]
LeafIndexInvalid { num_leafs: usize },
LeafIndexInvalid { num_leafs: MerkleTreeLeafIndex },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same ordinal / cardinal thing.

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>

Update twenty-first/src/util_types/merkle_tree.rs

Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
@aszepieniec aszepieniec force-pushed the asz/merkle-tree-index-types branch from f862aaa to 6220473 Compare May 13, 2025 11:14
aszepieniec and others added 2 commits May 13, 2025 13:15
Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
@Sword-Smith Sword-Smith force-pushed the master branch 2 times, most recently from 55e5cf7 to 8c942b4 Compare May 14, 2025 07:55
@aszepieniec aszepieniec merged commit fb6bb69 into master May 19, 2025
5 checks passed
@aszepieniec aszepieniec deleted the asz/merkle-tree-index-types branch May 19, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants