Skip to content

refactor(perf): calculate AreaIndex in as_bytes methods#1675

Open
rkuris wants to merge 9 commits intomainfrom
rkuris/as-bytes-tech-debt
Open

refactor(perf): calculate AreaIndex in as_bytes methods#1675
rkuris wants to merge 9 commits intomainfrom
rkuris/as-bytes-tech-debt

Conversation

@rkuris
Copy link
Member

@rkuris rkuris commented Feb 12, 2026

Why this should be merged

Refactored Node::as_bytes and added FreeArea::as_bytes to automatically calculate and return the AreaIndex instead of requiring callers to provide it. This eliminates the previous double-encoding pattern where callers had to encode twice to determine the correct area size.

How this works

  • Node::as_bytes and FreeArea::as_bytes now return a Result<AreaIndex, Error> and calculates the area index from encoded size automatically
  • Added NodeAllocator::io_error helper method for error conversion
  • Removed double-encoding logic from serialize_node_to_bump
  • Pre-reserves exact buffer size in FreeArea::as_bytes
  • Uses AsRef<[u8]> and IndexMut trait bounds for flexibility
  • Single-pass encoding for all nodes

How this was tested

TIP == test in production! Let's see if our perf numbers change after merging. The unit test benchmarks won't change, and in fact may be slightly worse since each call is doing more work, but we do avoid calling it twice in the happy path.

Resolves #1114

Refactored Node::as_bytes and added FreeArea::as_bytes to automatically
calculate and return the AreaIndex instead of requiring callers to provide it.
This eliminates the previous double-encoding pattern where callers had to
encode twice to determine the correct area size.

Changes:
- Node::as_bytes and FreeArea::as_bytes now return a Result<AreaIndex, Error>
  and calculates the area index from encoded size automatically
- Added NodeAllocator::io_error helper method for error conversion
- Removed double-encoding logic from serialize_node_to_bump

Performance improvements:
- Pre-reserves exact buffer size in FreeArea::as_bytes
- Uses AsRef<[u8]> and IndexMut trait bounds for flexibility
- Single-pass encoding for all nodes

All 465 tests pass with no clippy warnings.
…_bump

Changed error context from 'serialize_node_to_bump' to 'allocate_node' when
wrapping AreaIndex::from_size errors. This ensures the error context matches
what test_slow_giant_node expects and correctly reflects that the error occurs
during node allocation, not during serialization.
let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?;

// Update the first byte with the correct area size index
encoded[area_size_index_position] = area_index.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that an AreaIndex is computed from the encoded length, do we need to check that it can fit within a byte?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure i follow. AreaIndex is a NewType that only allows valid indexes into the AREA_INDEX array. This array can't have more than 255 entries.

I don't know that we enforce that anywhere but lots of things would break if that isn't followed. I can't imagine someone wanting that many area sizes though. We probably already have too many.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's pretty much impossible. The newtype itself is a wrapper around a u8, so it can't be larger than 255. The get method on an AreaIndex returns a u8, which can never be larger than a byte.

}

// Calculate the area index from the encoded length
let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let area_index = AreaIndex::from_size(encoded.as_ref().len() as u64)?;
let area_index = AreaIndex::from_size(area_size_index_position as u64)?;

nit/feel-free-to-ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's incorrect, but it does call out a possible issue. encoded has changed since it was calculated earlier, so these should be different. Arguably we should be subtracting area_size_index_position from the current len() but we don't do that, which could result in a larger size than we need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a685bc3

fn to_bytes(input: &Node) -> Vec<u8> {
let mut bytes = Vec::new();
input.as_bytes(firewood_storage::AreaIndex::MIN, &mut bytes);
let _area_index = input.as_bytes(&mut bytes).expect("to serialize node");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _area_index = input.as_bytes(&mut bytes).expect("to serialize node");
let _ = input.as_bytes(&mut bytes).expect("to serialize node");

nit/feel-free-to-ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea that it's documenting what we're throwing away, so I'm going to leave this one alone.


let mut serialized = Vec::new();
node.as_bytes(AreaIndex::MIN, &mut serialized);
let _area_index = node.as_bytes(&mut serialized).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _area_index = node.as_bytes(&mut serialized).unwrap();
let _ = node.as_bytes(&mut serialized).unwrap();

nit/feel-free-to-ignore

Nodes were going to be allocated to slices that are too big otherwise.
Buffer boundaries are hard, added some tests
Comment on lines +328 to +329
.checked_sub(area_size_index_position)
.expect("area_size_index_position should always be <= encoded length");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is logically impossible... why not use a wrapping sub?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node::as_bytes should fill in the area index byte

4 participants