Make trait TMerkleTreeNode dyn compatible (fka object safe)#406
Make trait TMerkleTreeNode dyn compatible (fka object safe)#406malcolmgreaves wants to merge 7 commits intomainfrom
TMerkleTreeNode dyn compatible (fka object safe)#406Conversation
- modified trait to use associated type for error - to_msgpack_bytes returns result w/ that error or vec<u8> - blanket impl uses something that also has serialize as to_msgpack_bytes implmentation - made `from_u8` return error & added it to the OxenError heirarchy WIP propigate changes
📝 WalkthroughSummary by CodeRabbit
WalkthroughRefactors the Merkle tree node serialization and error handling infrastructure: replaces Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
crates/lib/src/error.rs (1)
315-317: Consider addingRmpEncodeErrorto the internal error hints.The
RmpDecodeErroris included in thehint()method (Line 381) as an internal error, but the newRmpEncodeErroris not. For consistency, encoding errors should likely receive the same hint.🔧 Suggested fix
DB(_) | ArrowError(_) | BinCodeError(_) | RedisError(_) | R2D2Error(_) - | RmpDecodeError(_) => { + | RmpDecodeError(_) | RmpEncodeError(_) => { "This is an internal error. Run with RUST_LOG=debug for more details." }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/error.rs` around lines 315 - 317, Add RmpEncodeError to the internal error hints returned by the hint() method so encoding errors get the same diagnostic guidance as decoding errors; locate the hint() implementation and where it currently matches RmpDecodeError and include RmpEncodeError in that same arm (or add a separate arm that returns the same internal error hint) so both rmp_serde::encode::Error and rmp_serde::decode::Error map to the internal error hint.crates/lib/src/repositories/tree.rs (1)
1126-1128: Consider returning an error instead of panicking.The
panic!on unexpected node types could be replaced with anErrreturn for more graceful error handling, though this represents a programming error rather than a runtime condition.🔧 Optional fix
EMerkleTreeNode::File(file_node) => { db.add_child(file_node)?; } - node => { - panic!("p_write_tree Unexpected node type: {node:?}"); - } + EMerkleTreeNode::Commit(_) | EMerkleTreeNode::FileChunk(_) => { + return Err(OxenError::basic_str(format!( + "p_write_tree unexpected node type: {:?}", + child.node.node_type() + ))); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/repositories/tree.rs` around lines 1126 - 1128, Replace the panic in the catch-all match arm inside p_write_tree with a propagated error return: instead of panic!("p_write_tree Unexpected node type: {node:?}"), return an Err variant (using the crate's repository error type or anyhow::Error) that includes a descriptive message and the debug of node so callers can handle it; update the function signature to return Result if needed and adjust call sites to propagate the error via ? or map_err so this programming-error case is reported without aborting the process.crates/lib/src/core/db/merkle_node/merkle_node_db.rs (1)
257-264:?Sizedbounds needed only if trait objects are intended for these entry points.The current code's implicit
Sizedbounds onNwould reject&dyn TMerkleTreeNode<...>at these call sites. However, all current callers inp_write_treepass concrete types (vnode,dir_node,file_node), so this is not a current issue.If the design goal is to allow trait-object dispatch through these helpers, add
?Sized:Suggested changes (if needed)
- pub fn open_read_write_if_not_exists<N: TMerkleTreeNode>( + pub fn open_read_write_if_not_exists<N: TMerkleTreeNode + ?Sized>(- pub fn open_read_write<N: TMerkleTreeNode>( + pub fn open_read_write<N: TMerkleTreeNode + ?Sized>(- fn write_node<N: TMerkleTreeNode>( + fn write_node<N: TMerkleTreeNode + ?Sized>(- pub fn add_child<N: TMerkleTreeNode>(&mut self, item: &N) -> Result<(), OxenError> + pub fn add_child<N: TMerkleTreeNode + ?Sized>(&mut self, item: &N) -> Result<(), OxenError>Applies to: lines 257–264, 277–284, 373–380, 424–427.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 257 - 264, Update the generic bounds to allow trait-object dispatch by adding ?Sized to the N type parameter where needed: change the signatures of open_read_write_if_not_exists, (and the other helper functions flagged in the review at the same pattern) so the generic bound becomes N: TMerkleTreeNode + ?Sized and keep the parameter as node: &N; leave the existing where clauses (e.g., OxenError: From<N::SerializationError>) intact so SerializationError still resolves for unsized types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 331-334: The code currently uses
MerkleTreeNodeType::from_u8_unwrap on bytes read from disk (e.g., in the lookup
mapping shown) which will panic on invalid bytes; change these sites (the
occurrence in the lookup mapping and the similar uses at the other noted
locations) to call the fallible MerkleTreeNodeType::from_u8 and propagate the
error instead of unwrapping so open()/map() return Err for invalid node-type
bytes; update the surrounding logic in the functions handling disk reads (e.g.,
open(), map()) to propagate the Result from from_u8 rather than assuming a valid
value.
---
Nitpick comments:
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs`:
- Around line 257-264: Update the generic bounds to allow trait-object dispatch
by adding ?Sized to the N type parameter where needed: change the signatures of
open_read_write_if_not_exists, (and the other helper functions flagged in the
review at the same pattern) so the generic bound becomes N: TMerkleTreeNode +
?Sized and keep the parameter as node: &N; leave the existing where clauses
(e.g., OxenError: From<N::SerializationError>) intact so SerializationError
still resolves for unsized types.
In `@crates/lib/src/error.rs`:
- Around line 315-317: Add RmpEncodeError to the internal error hints returned
by the hint() method so encoding errors get the same diagnostic guidance as
decoding errors; locate the hint() implementation and where it currently matches
RmpDecodeError and include RmpEncodeError in that same arm (or add a separate
arm that returns the same internal error hint) so both rmp_serde::encode::Error
and rmp_serde::decode::Error map to the internal error hint.
In `@crates/lib/src/repositories/tree.rs`:
- Around line 1126-1128: Replace the panic in the catch-all match arm inside
p_write_tree with a propagated error return: instead of panic!("p_write_tree
Unexpected node type: {node:?}"), return an Err variant (using the crate's
repository error type or anyhow::Error) that includes a descriptive message and
the debug of node so callers can handle it; update the function signature to
return Result if needed and adjust call sites to propagate the error via ? or
map_err so this programming-error case is reported without aborting the process.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 823320ab-2b3e-4dfe-81e9-95cc6d71f313
📒 Files selected for processing (10)
crates/lib/src/core/db/merkle_node/merkle_node_db.rscrates/lib/src/error.rscrates/lib/src/model/merkle_tree/node/commit_node.rscrates/lib/src/model/merkle_tree/node/dir_node.rscrates/lib/src/model/merkle_tree/node/file_chunk_node.rscrates/lib/src/model/merkle_tree/node/file_node.rscrates/lib/src/model/merkle_tree/node/merkle_tree_node.rscrates/lib/src/model/merkle_tree/node/vnode.rscrates/lib/src/model/merkle_tree/node_type.rscrates/lib/src/repositories/tree.rs
| let dtype = lookup | ||
| .as_ref() | ||
| .map(|l| MerkleTreeNodeType::from_u8(l.data_type)) | ||
| .map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type)) | ||
| .unwrap_or(MerkleTreeNodeType::Commit); |
There was a problem hiding this comment.
Return invalid node-type bytes as Err, not panics.
These bytes come straight from the node files, so from_u8_unwrap() means one corrupt record—or a repo written with a newer node variant—can abort open()/map(). This PR already introduced a fallible from_u8() for exactly this case, and these methods already return Result, so the read path should propagate the error instead of crashing.
💡 Suggested fix
- let dtype = lookup
- .as_ref()
- .map(|l| MerkleTreeNodeType::from_u8_unwrap(l.data_type))
- .unwrap_or(MerkleTreeNodeType::Commit);
+ let dtype = lookup
+ .as_ref()
+ .map(|l| MerkleTreeNodeType::from_u8(l.data_type))
+ .transpose()?
+ .unwrap_or(MerkleTreeNodeType::Commit);- let data_type = MerkleTreeNodeType::from_u8_unwrap(lookup.data_type);
+ let data_type = MerkleTreeNodeType::from_u8(lookup.data_type)?;- let dtype = MerkleTreeNodeType::from_u8_unwrap(*dtype);
+ let dtype = MerkleTreeNodeType::from_u8(*dtype)?;Also applies to: 510-510, 529-529
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/lib/src/core/db/merkle_node/merkle_node_db.rs` around lines 331 - 334,
The code currently uses MerkleTreeNodeType::from_u8_unwrap on bytes read from
disk (e.g., in the lookup mapping shown) which will panic on invalid bytes;
change these sites (the occurrence in the lookup mapping and the similar uses at
the other noted locations) to call the fallible MerkleTreeNodeType::from_u8 and
propagate the error instead of unwrapping so open()/map() return Err for invalid
node-type bytes; update the surrounding logic in the functions handling disk
reads (e.g., open(), map()) to propagate the Result from from_u8 rather than
assuming a valid value.
First in a series of PRs to abstract the Merkle tree store so that we can provide different
backing implementations other than the existing custom file format based implementation.
There's no change in external behavior in this PR: it is a refactor.
This PR changes drops the
+ Serializeconstraint totrait TMerkleTreeNodeto make itdynCompatible (this was formally known asobject safe).Serializehas a methodserialize<S: Serializer>that prevents it from being compiled as a virtual table lookup,hence it's not
dynCompatible. [1]The change here drops the
+ Serializeconstraint from the trait and instead adds a newmethod
to_msgpack_bytes() -> Result<Vec<u8>, ...>, which performs the serialization.The trait also introduces a new
SerializationErrorassociated type, which is the error typereturned by
to_msgpack_bytes(). There's a blanket implementation that implements thisupdated trait for all existing concrete node implementations: it uses their
Serializeimpl.to provide a generic implementation for
to_msgpack_byteswith a shared serializationerror of
rmp_serde::decode::Error.Sites that were manually calling
serializeon the node now callto_msgpack_bytes().There's a new conversion of
decode::Errorinto anOxenErrorwrapper.Functions using nodes that used the
TMerkleTreeNodetrait have been updated withnew type constraints on the generic type that implements the trait. Specifically, there's
constraints to ensure that the
SerializationErrorcan be converted into anOxenErrorwhen appropriate. And
p_write_treehas been updated to ensure that theFileNode,DirNode, andVNodeconcrete implementations all haveTMerkleTreeNodeimplsthat have the same
SerializationError. This works because all usedecode::Error.[1] https://doc.rust-lang.org/reference/items/traits.html#r-items.traits.dyn-compatible