Skip to content

Decode improvements#76

Merged
frankmcsherry merged 3 commits intomasterfrom
decode_improvements
Mar 13, 2026
Merged

Decode improvements#76
frankmcsherry merged 3 commits intomasterfrom
decode_improvements

Conversation

@frankmcsherry
Copy link
Copy Markdown
Owner

This PR is largely authored by Claude, in which we improve the code gen around decoding &[u64] data into borrow containers, and then access those containers with minimal numbers of instructions. Various problems were removed from the critical path, and an observation was made that it is valuable to do field projection before calling Index::get, because once you do this Rust's deterministic error semantics require that each output coordinate be formed at least to the point of checking all panics. Instead, project first then borrow. We verified 4 instructions with a cold bounds test on read when done this way.

@frankmcsherry
Copy link
Copy Markdown
Owner Author

frankmcsherry commented Mar 13, 2026

Claude did have a recommendation about how to zero-copy hand out slices without iterators. It seems sound, but is a change. Recording for posterity, as it seems easy to adopt at any breaking change moment:

  The zero-allocation approach would be to skip materializing the &[&[u8]] entirely. Instead of  
  collecting decoded slices and passing them to from_byte_slices, give FromBytes direct access to
   the indexed store:                                                                            

  fn from_store(store: &'a [u64], offset: usize) -> Self where Self: Sized {
      // default: fall back to decode_index + from_byte_slices for one slice
      Self::from_bytes(&mut std::iter::once(Indexed::decode_index(store, offset as u64)))
  }

  Then implementations use decode_index directly at each level, passing offsets to children:

  // Primitive (SLICE_COUNT = 1):
  fn from_store(store: &'a [u64], offset: usize) -> Self {
      bytemuck::try_cast_slice(Indexed::decode_index(store, offset as u64)).unwrap()
  }

  // Vec-like (SLICE_COUNT = 1 + CC::SLICE_COUNT):
  fn from_store(store: &'a [u64], offset: usize) -> Self {
      let bounds = <&[u64]>::from_store(store, offset);
      let values = CC::from_store(store, offset + 1);
      ...
  }

  // Tuple (A, B):
  fn from_store(store: &'a [u64], offset: usize) -> Self {
      let a = A::from_store(store, offset);
      let b = B::from_store(store, offset + A::SLICE_COUNT);
      (a, b)
  }

  And Stash::borrow becomes just:

  Stash::Align(a) => <C::Borrowed<'_> as FromBytes>::from_store(a, 0),

  No collection, no allocation — each leaf calls decode_index exactly once. The pattern mirrors
  from_byte_slices exactly but replaces slice indexing (bytes[offset]) with decode_index(store,
  offset).

  The tradeoff is coupling FromBytes to the &[u64] indexed store format. Though given that this
  is all internal to the columnar encoding, that might be fine?

@frankmcsherry frankmcsherry merged commit b3297fb into master Mar 13, 2026
6 checks passed
@frankmcsherry frankmcsherry deleted the decode_improvements branch March 13, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant