refactor(perf): calculate AreaIndex in as_bytes methods#1675
refactor(perf): calculate AreaIndex in as_bytes methods#1675
Conversation
Refactored Node::as_bytes and added FreeArea::as_bytes to automatically calculate and return the AreaIndex instead of requiring callers to provide it. This eliminates the previous double-encoding pattern where callers had to encode twice to determine the correct area size. Changes: - Node::as_bytes and FreeArea::as_bytes now return a Result<AreaIndex, Error> and calculates the area index from encoded size automatically - Added NodeAllocator::io_error helper method for error conversion - Removed double-encoding logic from serialize_node_to_bump Performance improvements: - Pre-reserves exact buffer size in FreeArea::as_bytes - Uses AsRef<[u8]> and IndexMut trait bounds for flexibility - Single-pass encoding for all nodes All 465 tests pass with no clippy warnings.
…_bump Changed error context from 'serialize_node_to_bump' to 'allocate_node' when wrapping AreaIndex::from_size errors. This ensures the error context matches what test_slow_giant_node expects and correctly reflects that the error occurs during node allocation, not during serialization.
| let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?; | ||
|
|
||
| // Update the first byte with the correct area size index | ||
| encoded[area_size_index_position] = area_index.get(); |
There was a problem hiding this comment.
Given that an AreaIndex is computed from the encoded length, do we need to check that it can fit within a byte?
There was a problem hiding this comment.
Not sure i follow. AreaIndex is a NewType that only allows valid indexes into the AREA_INDEX array. This array can't have more than 255 entries.
I don't know that we enforce that anywhere but lots of things would break if that isn't followed. I can't imagine someone wanting that many area sizes though. We probably already have too many.
There was a problem hiding this comment.
Yeah that's pretty much impossible. The newtype itself is a wrapper around a u8, so it can't be larger than 255. The get method on an AreaIndex returns a u8, which can never be larger than a byte.
storage/src/node/mod.rs
Outdated
| } | ||
|
|
||
| // Calculate the area index from the encoded length | ||
| let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?; |
There was a problem hiding this comment.
| let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?; | |
| let area_index = AreaIndex::from_size(area_size_index_position as u64)?; |
nit/feel-free-to-ignore
There was a problem hiding this comment.
Actually that's incorrect, but it does call out a possible issue. encoded has changed since it was calculated earlier, so these should be different. Arguably we should be subtracting area_size_index_position from the current len() but we don't do that, which could result in a larger size than we need.
| fn to_bytes(input: &Node) -> Vec<u8> { | ||
| let mut bytes = Vec::new(); | ||
| input.as_bytes(firewood_storage::AreaIndex::MIN, &mut bytes); | ||
| let _area_index = input.as_bytes(&mut bytes).expect("to serialize node"); |
There was a problem hiding this comment.
| let _area_index = input.as_bytes(&mut bytes).expect("to serialize node"); | |
| let _ = input.as_bytes(&mut bytes).expect("to serialize node"); |
nit/feel-free-to-ignore
There was a problem hiding this comment.
I like the idea that it's documenting what we're throwing away, so I'm going to leave this one alone.
|
|
||
| let mut serialized = Vec::new(); | ||
| node.as_bytes(AreaIndex::MIN, &mut serialized); | ||
| let _area_index = node.as_bytes(&mut serialized).unwrap(); |
There was a problem hiding this comment.
| let _area_index = node.as_bytes(&mut serialized).unwrap(); | |
| let _ = node.as_bytes(&mut serialized).unwrap(); |
nit/feel-free-to-ignore
Nodes were going to be allocated to slices that are too big otherwise.
Buffer boundaries are hard, added some tests
| .checked_sub(area_size_index_position) | ||
| .expect("area_size_index_position should always be <= encoded length"); |
There was a problem hiding this comment.
this is logically impossible... why not use a wrapping sub?
Why this should be merged
Refactored Node::as_bytes and added FreeArea::as_bytes to automatically calculate and return the AreaIndex instead of requiring callers to provide it. This eliminates the previous double-encoding pattern where callers had to encode twice to determine the correct area size.
How this works
How this was tested
TIP == test in production! Let's see if our perf numbers change after merging. The unit test benchmarks won't change, and in fact may be slightly worse since each call is doing more work, but we do avoid calling it twice in the happy path.
Resolves #1114