Skip to content

Make trait TMerkleTreeNode dyn compatible (fka object safe)#406

Open
malcolmgreaves wants to merge 7 commits intomainfrom
mg/1_TMerkleTreeNode_object_safety
Open

Make trait TMerkleTreeNode dyn compatible (fka object safe)#406
malcolmgreaves wants to merge 7 commits intomainfrom
mg/1_TMerkleTreeNode_object_safety

Conversation

@malcolmgreaves
Copy link
Copy Markdown
Collaborator

First in a series of PRs to abstract the Merkle tree store so that we can provide different
backing implementations other than the existing custom file format based implementation.

There's no change in external behavior in this PR: it is a refactor.

This PR changes drops the + Serialize constraint to trait TMerkleTreeNode to make it
dyn Compatible (this was formally known as object safe). Serialize has a method
serialize<S: Serializer> that prevents it from being compiled as a virtual table lookup,
hence it's not dyn Compatible. [1]

The change here drops the + Serialize constraint from the trait and instead adds a new
method to_msgpack_bytes() -> Result<Vec<u8>, ...>, which performs the serialization.
The trait also introduces a new SerializationError associated type, which is the error type
returned by to_msgpack_bytes(). There's a blanket implementation that implements this
updated trait for all existing concrete node implementations: it uses their Serialize impl.
to provide a generic implementation for to_msgpack_bytes with a shared serialization
error of rmp_serde::decode::Error.

Sites that were manually calling serialize on the node now call to_msgpack_bytes().
There's a new conversion of decode::Error into an OxenError wrapper.

Functions using nodes that used the TMerkleTreeNode trait have been updated with
new type constraints on the generic type that implements the trait. Specifically, there's
constraints to ensure that the SerializationError can be converted into an OxenError
when appropriate. And p_write_tree has been updated to ensure that the FileNode,
DirNode, and VNode concrete implementations all have TMerkleTreeNode impls
that have the same SerializationError. This works because all use decode::Error.

[1] https://doc.rust-lang.org/reference/items/traits.html#r-items.traits.dyn-compatible

- modified trait to use associated type for error
- to_msgpack_bytes returns result w/ that error or vec<u8>
- blanket impl uses something that also has serialize as to_msgpack_bytes implmentation
- made `from_u8` return error & added it to the OxenError heirarchy

WIP propigate changes
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Refactor

    • Restructured merkle tree node handling with improved error handling architecture
    • Refactored trait implementation patterns for consistent error type propagation throughout serialization operations
  • Bug Fixes

    • Improved reliability of serialization operations with proper error propagation mechanisms

Walkthrough

Refactors the Merkle tree node serialization and error handling infrastructure: replaces OxenError with specific rmp_serde error types for deserialization, reworks TMerkleTreeNode trait to include an associated SerializationError type and to_msgpack_bytes method, removes explicit trait implementations from node types in favor of a blanket implementation, and updates parsing to return Result instead of panicking.

Changes

Cohort / File(s) Summary
Error Type Additions
crates/lib/src/error.rs
Added OxenError::MerkleTreeError for unknown node-type markers and OxenError::RmpEncodeError for messagepack encoding failures.
Merkle Tree Node Type Refactoring
crates/lib/src/model/merkle_tree/node_type.rs
Introduced InvalidMerkleTreeNodeType error type; replaced panicking from_u8() with fallible Result variant; added from_u8_unwrap() for legacy panic behavior; reworked TMerkleTreeNode trait to be object-safe with SerializationError associated type and to_msgpack_bytes() method; added blanket implementation for all Serialize types.
Node Deserialization Updates
crates/lib/src/model/merkle_tree/node/{commit_node.rs, dir_node.rs, file_chunk_node.rs, file_node.rs, vnode.rs, merkle_tree_node.rs}
Changed all deserialize() methods to return rmp_serde::decode::Error instead of OxenError; removed TMerkleTreeNode trait implementations from individual node types; updated MerkleTreeNode::deserialize_id() to propagate new error type.
Database Layer Updates
crates/lib/src/core/db/merkle_node/merkle_node_db.rs
Updated to_node() to return rmp_serde::decode::Error; replaced MerkleTreeNodeType::from_u8() with from_u8_unwrap() in decode paths; generalized open_read_write*() and add_child() with named generics N: TMerkleTreeNode; replaced panicking .serialize() calls with to_msgpack_bytes()? error propagation; added where OxenError: From<N::SerializationError> bounds.
Tree Writing Updates
crates/lib/src/repositories/tree.rs
Refactored p_write_tree() from trait-object parameter to explicit generic type parameters <N, S> with where clauses constraining TMerkleTreeNode and error conversion; enables same generic N to flow through recursive tree-write operations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #145: Modifies Merkle tree node type handling, serialization surface, and messagepack error types in parallel with this refactoring.
  • PR #176: Refactors repositories::tree and CommitMerkleTree interfaces that interact with the updated Merkle node APIs and TMerkleTreeNode trait.

Suggested reviewers

  • rpschoenburg
  • gschoeni
  • jcelliott

Poem

🐰 The Merkle tree now hops with grace,
Where errors find their rightful place—
From OxenError's heavy load,
To rmp_serde's lighter road.
With traits reborn and generics bright,
The serialization flows just right! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main change: making the TMerkleTreeNode trait dyn-compatible by removing the Serialize constraint, which is the core refactor objective.
Description check ✅ Passed The PR description is directly related to the changeset, providing detailed context about why the Serialize constraint was removed, how the new trait design works, and explaining the blanket implementation approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mg/1_TMerkleTreeNode_object_safety

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
crates/lib/src/error.rs (1)

315-317: Consider adding RmpEncodeError to the internal error hints.

The RmpDecodeError is included in the hint() method (Line 381) as an internal error, but the new RmpEncodeError is not. For consistency, encoding errors should likely receive the same hint.

🔧 Suggested fix
             DB(_) | ArrowError(_) | BinCodeError(_) | RedisError(_) | R2D2Error(_)
-            | RmpDecodeError(_) => {
+            | RmpDecodeError(_) | RmpEncodeError(_) => {
                 "This is an internal error. Run with RUST_LOG=debug for more details."
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/error.rs` around lines 315 - 317, Add RmpEncodeError to the
internal error hints returned by the hint() method so encoding errors get the
same diagnostic guidance as decoding errors; locate the hint() implementation
and where it currently matches RmpDecodeError and include RmpEncodeError in that
same arm (or add a separate arm that returns the same internal error hint) so
both rmp_serde::encode::Error and rmp_serde::decode::Error map to the internal
error hint.
crates/lib/src/repositories/tree.rs (1)

1126-1128: Consider returning an error instead of panicking.

The panic! on unexpected node types could be replaced with an Err return for more graceful error handling, though this represents a programming error rather than a runtime condition.

🔧 Optional fix
             EMerkleTreeNode::File(file_node) => {
                 db.add_child(file_node)?;
             }
-            node => {
-                panic!("p_write_tree Unexpected node type: {node:?}");
-            }
+            EMerkleTreeNode::Commit(_) | EMerkleTreeNode::FileChunk(_) => {
+                return Err(OxenError::basic_str(format!(
+                    "p_write_tree unexpected node type: {:?}",
+                    child.node.node_type()
+                )));
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/repositories/tree.rs` around lines 1126 - 1128, Replace the
panic in the catch-all match arm inside p_write_tree with a propagated error
return: instead of panic!("p_write_tree Unexpected node type: {node:?}"), return
an Err variant (using the crate's repository error type or anyhow::Error) that
includes a descriptive message and the debug of node so callers can handle it;
update the function signature to return Result if needed and adjust call sites
to propagate the error via ? or map_err so this programming-error case is
reported without aborting the process.
crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)

257-264: ?Sized bounds needed only if trait objects are intended for these entry points.

The current code's implicit Sized bounds on N would reject &dyn TMerkleTreeNode<...> at these call sites. However, all current callers in p_write_tree pass concrete types (vnode, dir_node, file_node), so this is not a current issue.

If the design goal is to allow trait-object dispatch through these helpers, add ?Sized:

Suggested changes (if needed)
-    pub fn open_read_write_if_not_exists<N: TMerkleTreeNode>(
+    pub fn open_read_write_if_not_exists<N: TMerkleTreeNode + ?Sized>(
-    pub fn open_read_write<N: TMerkleTreeNode>(
+    pub fn open_read_write<N: TMerkleTreeNode + ?Sized>(
-    fn write_node<N: TMerkleTreeNode>(
+    fn write_node<N: TMerkleTreeNode + ?Sized>(
-    pub fn add_child<N: TMerkleTreeNode>(&mut self, item: &N) -> Result<(), OxenError>
+    pub fn add_child<N: TMerkleTreeNode + ?Sized>(&mut self, item: &N) -> Result<(), OxenError>

Applies to: lines 257–264, 277–284, 373–380, 424–427.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 257 - 264,
Update the generic bounds to allow trait-object dispatch by adding ?Sized to the
N type parameter where needed: change the signatures of
open_read_write_if_not_exists, (and the other helper functions flagged in the
review at the same pattern) so the generic bound becomes N: TMerkleTreeNode +
?Sized and keep the parameter as node: &N; leave the existing where clauses
(e.g., OxenError: From<N::SerializationError>) intact so SerializationError
still resolves for unsized types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 331-334: The code currently uses
MerkleTreeNodeType::from_u8_unwrap on bytes read from disk (e.g., in the lookup
mapping shown) which will panic on invalid bytes; change these sites (the
occurrence in the lookup mapping and the similar uses at the other noted
locations) to call the fallible MerkleTreeNodeType::from_u8 and propagate the
error instead of unwrapping so open()/map() return Err for invalid node-type
bytes; update the surrounding logic in the functions handling disk reads (e.g.,
open(), map()) to propagate the Result from from_u8 rather than assuming a valid
value.

---

Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 257-264: Update the generic bounds to allow trait-object dispatch
by adding ?Sized to the N type parameter where needed: change the signatures of
open_read_write_if_not_exists, (and the other helper functions flagged in the
review at the same pattern) so the generic bound becomes N: TMerkleTreeNode +
?Sized and keep the parameter as node: &N; leave the existing where clauses
(e.g., OxenError: From<N::SerializationError>) intact so SerializationError
still resolves for unsized types.

In `@crates/lib/src/error.rs`:
- Around line 315-317: Add RmpEncodeError to the internal error hints returned
by the hint() method so encoding errors get the same diagnostic guidance as
decoding errors; locate the hint() implementation and where it currently matches
RmpDecodeError and include RmpEncodeError in that same arm (or add a separate
arm that returns the same internal error hint) so both rmp_serde::encode::Error
and rmp_serde::decode::Error map to the internal error hint.

In `@crates/lib/src/repositories/tree.rs`:
- Around line 1126-1128: Replace the panic in the catch-all match arm inside
p_write_tree with a propagated error return: instead of panic!("p_write_tree
Unexpected node type: {node:?}"), return an Err variant (using the crate's
repository error type or anyhow::Error) that includes a descriptive message and
the debug of node so callers can handle it; update the function signature to
return Result if needed and adjust call sites to propagate the error via ? or
map_err so this programming-error case is reported without aborting the process.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 823320ab-2b3e-4dfe-81e9-95cc6d71f313

📥 Commits

Reviewing files that changed from the base of the PR and between 60389ea and ce6c05f.

📒 Files selected for processing (10)
  • crates/lib/src/core/db/merkle_node/merkle_node_db.rs
  • crates/lib/src/error.rs
  • crates/lib/src/model/merkle_tree/node/commit_node.rs
  • crates/lib/src/model/merkle_tree/node/dir_node.rs
  • crates/lib/src/model/merkle_tree/node/file_chunk_node.rs
  • crates/lib/src/model/merkle_tree/node/file_node.rs
  • crates/lib/src/model/merkle_tree/node/merkle_tree_node.rs
  • crates/lib/src/model/merkle_tree/node/vnode.rs
  • crates/lib/src/model/merkle_tree/node_type.rs
  • crates/lib/src/repositories/tree.rs

Comment on lines 331 to 334
let dtype = lookup
.as_ref()
.map(|l| MerkleTreeNodeType::from_u8(l.data_type))
.map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type))
.unwrap_or(MerkleTreeNodeType::Commit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return invalid node-type bytes as Err, not panics.

These bytes come straight from the node files, so from_u8_unwrap() means one corrupt record—or a repo written with a newer node variant—can abort open()/map(). This PR already introduced a fallible from_u8() for exactly this case, and these methods already return Result, so the read path should propagate the error instead of crashing.

💡 Suggested fix
-        let dtype = lookup
-            .as_ref()
-            .map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type))
-            .unwrap_or(MerkleTreeNodeType::Commit);
+        let dtype = lookup
+            .as_ref()
+            .map(|l| MerkleTreeNodeType::from_u8(l.data_type))
+            .transpose()?
+            .unwrap_or(MerkleTreeNodeType::Commit);
-        let data_type = MerkleTreeNodeType::from_u8_unwrap(lookup.data_type);
+        let data_type = MerkleTreeNodeType::from_u8(lookup.data_type)?;
-            let dtype = MerkleTreeNodeType::from_u8_unwrap(*dtype);
+            let dtype = MerkleTreeNodeType::from_u8(*dtype)?;

Also applies to: 510-510, 529-529

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 331 - 334,
The code currently uses MerkleTreeNodeType::from_u8_unwrap on bytes read from
disk (e.g., in the lookup mapping shown) which will panic on invalid bytes;
change these sites (the occurrence in the lookup mapping and the similar uses at
the other noted locations) to call the fallible MerkleTreeNodeType::from_u8 and
propagate the error instead of unwrapping so open()/map() return Err for invalid
node-type bytes; update the surrounding logic in the functions handling disk
reads (e.g., open(), map()) to propagate the Result from from_u8 rather than
assuming a valid value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant