Skip to content

Commit 1c1764c

Browse files
authored
Merge pull request #16 from olivmath/claude/issue-11-duplicate-leaves-01VwesaAX93Kwmeq6bLVcZvR
docs: investigate and document Issue #11 - duplicate leaves behavior
2 parents 1b6cb66 + 6427259 commit 1c1764c

File tree

2 files changed

+392
-0
lines changed

2 files changed

+392
-0
lines changed

ISSUE_11_FINDINGS.md

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# Issue #11: Duplicate Leaves Investigation
2+
3+
## Summary
4+
5+
This document describes the behavior of the Merkle Tree implementation when duplicate leaves are present in the tree.
6+
7+
## Question
8+
9+
What happens when there are duplicate leaves in the leaf list?
10+
- How are the proofs generated?
11+
- How is the root generated?
12+
13+
Example: `["m","e","r","k","l","e","t","r","e","e","r","s"]`
14+
15+
## Findings
16+
17+
### 1. Root Generation ✅
18+
19+
**The root is generated correctly even with duplicate leaves.**
20+
21+
- The Merkle root calculation works as expected
22+
- Duplicate leaves are treated as independent nodes at their respective positions
23+
- The tree structure is built correctly regardless of duplicate values
24+
25+
Example:
26+
```rust
27+
let data = ["a", "b", "a", "c"]; // "a" appears twice
28+
let tree = MerkleTree::new(leaves);
29+
// Root is calculated successfully
30+
```
31+
32+
### 2. Proof Generation ⚠️
33+
34+
**Proofs are always generated for the FIRST occurrence of a duplicate leaf.**
35+
36+
The implementation uses `iter().position()` which returns the index of the **first matching element**.
37+
38+
Example:
39+
```rust
40+
let data = ["a", "b", "c", "a", "d", "a"]; // "a" at indices 0, 3, 5
41+
42+
let mut leaf_a = hash("a");
43+
let proof = tree.make_proof(leaf_a);
44+
// This proof will ALWAYS be for the "a" at index 0
45+
```
46+
47+
**Important implications:**
48+
- You cannot generate proofs for the 2nd, 3rd, etc. occurrences of a duplicate leaf
49+
- The library has no way to distinguish between different positions of the same value
50+
- All proofs for a duplicate value will prove the first occurrence
51+
52+
### 3. Proof Verification ✅
53+
54+
**Proof verification works correctly.**
55+
56+
- The generated proof can be successfully verified
57+
- `check_proof()` correctly reconstructs the root
58+
- The proof is mathematically valid for the first occurrence
59+
60+
### 4. Test Results
61+
62+
Tested with the exact example from Issue #11:
63+
```
64+
["m","e","r","k","l","e","t","r","e","e","r","s"]
65+
^ ^ ^ ^ ^ ^ (duplicates)
66+
```
67+
68+
Results:
69+
- **Root generated**: ✅ Success
70+
- **Proofs for 'm'**: ✅ Valid (4 nodes)
71+
- **Proofs for 'e'**: ✅ Valid (4 nodes) - proves FIRST 'e'
72+
- **Proofs for 'r'**: ✅ Valid (4 nodes) - proves FIRST 'r'
73+
- **Proofs for 'k'**: ✅ Valid (4 nodes)
74+
- **Proofs for 'l'**: ✅ Valid (4 nodes)
75+
- **Proofs for 't'**: ✅ Valid (4 nodes)
76+
- **Proofs for 's'**: ✅ Valid (3 nodes)
77+
78+
## Recommendations
79+
80+
### For Library Users
81+
82+
1. **Avoid duplicate leaves if possible**
83+
- Use unique identifiers or add position/index information to values
84+
- Consider hashing `value + position` instead of just `value`
85+
86+
2. **If duplicates are necessary**
87+
- Be aware that proofs will always reference the first occurrence
88+
- Document this behavior in your application
89+
- Consider adding metadata to distinguish duplicate values
90+
91+
3. **Example workaround**:
92+
```rust
93+
// Instead of this:
94+
let leaves = ["a", "b", "a", "c"];
95+
96+
// Do this:
97+
let leaves_with_index = [
98+
("a", 0),
99+
("b", 1),
100+
("a", 2), // Now unique!
101+
("c", 3),
102+
];
103+
```
104+
105+
### For Library Maintainers
106+
107+
#### Option 1: Keep Current Behavior (Recommended)
108+
- Document the "first occurrence" behavior clearly
109+
- Add warning in documentation about duplicate leaves
110+
- This is the simplest and most performant approach
111+
112+
#### Option 2: Add Position-based Proof API
113+
Add a new method to specify which occurrence:
114+
```rust
115+
// Current API (keeps first occurrence behavior)
116+
pub fn make_proof(&self, leaf: Leaf) -> Vec<Node>
117+
118+
// New API (specify position)
119+
pub fn make_proof_at(&self, leaf: Leaf, position: usize) -> Result<Vec<Node>, Error>
120+
```
121+
122+
#### Option 3: Reject Duplicates
123+
Add validation to reject duplicate leaves:
124+
```rust
125+
impl MerkleTree {
126+
pub fn new(leaves: Vec<Leaf>) -> Result<Self, Error> {
127+
// Check for duplicates
128+
let unique: HashSet<_> = leaves.iter().collect();
129+
if unique.len() != leaves.len() {
130+
return Err(Error::DuplicateLeaves);
131+
}
132+
// ... rest of implementation
133+
}
134+
}
135+
```
136+
137+
## Conclusion
138+
139+
The current implementation handles duplicate leaves **gracefully but with limitations**:
140+
141+
**Pros:**
142+
- No crashes or panics
143+
- Root generation works correctly
144+
- Proofs are mathematically valid
145+
- Performance is not impacted
146+
147+
⚠️ **Cons:**
148+
- Cannot generate proofs for non-first occurrences of duplicates
149+
- No way to specify which duplicate to prove
150+
- Behavior may be surprising to users
151+
152+
**Recommendation**: Document the current behavior clearly and advise users to avoid duplicates or add position information to their values.
153+
154+
## Test Coverage
155+
156+
All tests pass successfully:
157+
- `test_duplicate_leaves_root_generation`
158+
- `test_duplicate_leaves_proof_generation`
159+
- `test_duplicate_leaves_multiple_proofs`
160+
- `test_unique_leaves_vs_duplicate_leaves`
161+
- `test_issue_11_exact_example`
162+
163+
See `tests/test_issue_11_duplicates.rs` for complete test implementation.

tests/test_issue_11_duplicates.rs

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
/// Investigation of Issue #11: Duplicate leaves behavior
2+
///
3+
/// This test investigates what happens when there are duplicate leaves in the tree.
4+
/// Questions to answer:
5+
/// 1. How are the proofs generated?
6+
/// 2. How is the root generated?
7+
/// 3. What happens when we try to make a proof for a duplicate leaf?
8+
9+
use merkletreers::tree::MerkleTree;
10+
use merkletreers::utils::hash_it;
11+
12+
#[cfg(test)]
13+
mod tests {
14+
use super::*;
15+
16+
#[test]
17+
fn test_duplicate_leaves_root_generation() {
18+
// Example from issue #11: ["m","e","r","k","l","e","t","r","e","e","r","s"]
19+
// Note: "e" appears 4 times, "r" appears 3 times
20+
let data = ["m", "e", "r", "k", "l", "e", "t", "r", "e", "e", "r", "s"];
21+
22+
let leaves = data
23+
.iter()
24+
.map(|d| {
25+
let mut buffer = [0u8; 32];
26+
hash_it(d.as_bytes(), &mut buffer);
27+
buffer
28+
})
29+
.collect::<Vec<[u8; 32]>>();
30+
31+
// Create tree with duplicate leaves
32+
let tree = MerkleTree::new(leaves.clone());
33+
34+
// Root should be generated successfully
35+
println!("Root with duplicates: {:?}", tree.root);
36+
assert_ne!(tree.root, [0u8; 32], "Root should not be zero");
37+
38+
// Let's also test with a smaller example for clarity
39+
let simple_duplicates = ["a", "b", "a", "c"];
40+
let simple_leaves = simple_duplicates
41+
.iter()
42+
.map(|d| {
43+
let mut buffer = [0u8; 32];
44+
hash_it(d.as_bytes(), &mut buffer);
45+
buffer
46+
})
47+
.collect::<Vec<[u8; 32]>>();
48+
49+
let simple_tree = MerkleTree::new(simple_leaves);
50+
println!("Simple root with duplicates: {:?}", simple_tree.root);
51+
assert_ne!(simple_tree.root, [0u8; 32], "Simple root should not be zero");
52+
}
53+
54+
#[test]
55+
fn test_duplicate_leaves_proof_generation() {
56+
// When we have duplicates, the proof is generated for the FIRST occurrence
57+
let data = ["a", "b", "a", "c"];
58+
59+
let leaves = data
60+
.iter()
61+
.map(|d| {
62+
let mut buffer = [0u8; 32];
63+
hash_it(d.as_bytes(), &mut buffer);
64+
buffer
65+
})
66+
.collect::<Vec<[u8; 32]>>();
67+
68+
let tree = MerkleTree::new(leaves.clone());
69+
70+
// Hash of "a"
71+
let mut leaf_a = [0u8; 32];
72+
hash_it("a".as_bytes(), &mut leaf_a);
73+
74+
// Make proof for "a" - this will find the FIRST occurrence at index 0
75+
let proof = tree.make_proof(leaf_a);
76+
77+
println!("Proof for duplicate 'a': {:?}", proof);
78+
79+
// Save root before check_proof consumes tree
80+
let expected_root = tree.root;
81+
82+
// Verify the proof
83+
let computed_root = tree.check_proof(proof, leaf_a);
84+
println!("Computed root: {:?}", computed_root);
85+
println!("Expected root: {:?}", expected_root);
86+
87+
// The proof is valid because it proves the FIRST occurrence
88+
assert_eq!(
89+
computed_root, expected_root,
90+
"Proof verification should succeed for first occurrence"
91+
);
92+
93+
// Important note: We cannot distinguish between different positions of the same leaf value
94+
// The proof will always be for the FIRST occurrence found by iter().position()
95+
}
96+
97+
#[test]
98+
fn test_duplicate_leaves_multiple_proofs() {
99+
// Test what happens when we try to make proofs for all instances of a duplicate
100+
let data = ["a", "b", "c", "a", "d", "a"];
101+
102+
let leaves = data
103+
.iter()
104+
.map(|d| {
105+
let mut buffer = [0u8; 32];
106+
hash_it(d.as_bytes(), &mut buffer);
107+
buffer
108+
})
109+
.collect::<Vec<[u8; 32]>>();
110+
111+
let tree = MerkleTree::new(leaves.clone());
112+
113+
// Hash of "a"
114+
let mut leaf_a = [0u8; 32];
115+
hash_it("a".as_bytes(), &mut leaf_a);
116+
117+
// Try to make proof for "a"
118+
// The current implementation will find the FIRST occurrence at index 0
119+
let proof = tree.make_proof(leaf_a);
120+
121+
println!("Number of proof nodes: {}", proof.len());
122+
println!("Proof for first 'a': {:?}", proof);
123+
124+
// Save root before check_proof consumes tree
125+
let expected_root = tree.root;
126+
127+
// Verify the proof
128+
let computed_root = tree.check_proof(proof.clone(), leaf_a);
129+
println!("Computed root: {:?}", computed_root);
130+
println!("Expected root: {:?}", expected_root);
131+
132+
assert_eq!(
133+
computed_root, expected_root,
134+
"Proof verification should succeed for first occurrence"
135+
);
136+
137+
// The issue is: we cannot distinguish between different positions of the same leaf value
138+
// The proof will always be for the FIRST occurrence
139+
}
140+
141+
#[test]
142+
fn test_unique_leaves_vs_duplicate_leaves() {
143+
// Compare behavior with unique vs duplicate leaves
144+
145+
// Unique leaves
146+
let unique_data = ["a", "b", "c", "d"];
147+
let unique_leaves = unique_data
148+
.iter()
149+
.map(|d| {
150+
let mut buffer = [0u8; 32];
151+
hash_it(d.as_bytes(), &mut buffer);
152+
buffer
153+
})
154+
.collect::<Vec<[u8; 32]>>();
155+
156+
let unique_tree = MerkleTree::new(unique_leaves.clone());
157+
158+
// Duplicate leaves (same data but "a" appears twice)
159+
let duplicate_data = ["a", "b", "a", "d"];
160+
let duplicate_leaves = duplicate_data
161+
.iter()
162+
.map(|d| {
163+
let mut buffer = [0u8; 32];
164+
hash_it(d.as_bytes(), &mut buffer);
165+
buffer
166+
})
167+
.collect::<Vec<[u8; 32]>>();
168+
169+
let duplicate_tree = MerkleTree::new(duplicate_leaves.clone());
170+
171+
// Roots should be different because the tree structure is different
172+
assert_ne!(
173+
unique_tree.root, duplicate_tree.root,
174+
"Different leaf arrangements should produce different roots"
175+
);
176+
177+
println!("Unique tree root: {:?}", unique_tree.root);
178+
println!("Duplicate tree root: {:?}", duplicate_tree.root);
179+
}
180+
181+
#[test]
182+
fn test_issue_11_exact_example() {
183+
// Exact example from issue #11
184+
let data = ["m", "e", "r", "k", "l", "e", "t", "r", "e", "e", "r", "s"];
185+
186+
let leaves = data
187+
.iter()
188+
.map(|d| {
189+
let mut buffer = [0u8; 32];
190+
hash_it(d.as_bytes(), &mut buffer);
191+
buffer
192+
})
193+
.collect::<Vec<[u8; 32]>>();
194+
195+
println!("Number of leaves: {}", leaves.len());
196+
197+
let tree = MerkleTree::new(leaves.clone());
198+
199+
println!("Root: {:?}", tree.root);
200+
let expected_root = tree.root;
201+
202+
// Try to make proof for each unique letter
203+
let unique_letters = ["m", "e", "r", "k", "l", "t", "s"];
204+
205+
for letter in unique_letters.iter() {
206+
let mut leaf = [0u8; 32];
207+
hash_it(letter.as_bytes(), &mut leaf);
208+
209+
let tree_for_proof = MerkleTree::new(leaves.clone());
210+
let proof = tree_for_proof.make_proof(leaf);
211+
212+
let tree_for_check = MerkleTree::new(leaves.clone());
213+
let computed_root = tree_for_check.check_proof(proof.clone(), leaf);
214+
215+
println!(
216+
"Letter '{}': proof length = {}, verification = {}",
217+
letter,
218+
proof.len(),
219+
computed_root == expected_root
220+
);
221+
222+
assert_eq!(
223+
computed_root, expected_root,
224+
"Proof verification should succeed for '{}'",
225+
letter
226+
);
227+
}
228+
}
229+
}

0 commit comments

Comments
 (0)