Skip to content

Commit 8f254ac

Browse files
Decode u64s rather than u8s (#78)
* Add from_u64s and decode_u64s for panic-free, alignment-check-free decoding Adds a new decode path that preserves u64 alignment information through the entire pipeline, eliminating per-field alignment checks that bytemuck::try_cast_slice required when going through &[u8]. Key changes: - decode_u64s: returns (&[u64], u8) pairs instead of &[u8] slices, where the u8 indicates valid trailing bytes in the last word - from_u64s on FromBytes: non-panicking field construction that enables LLVM to eliminate unused tuple fields as dead code - validate/validate_typed: upfront structural and type-compatibility checks for encoded data, replacing the implicit panic-on-bad-data - Remove inspect module (superseded by examples/decode_asm.rs) Assembly impact for accessing field 0 of a k-tuple of u64s: Old (from_bytes): k=3 → 133 insns, k=8 → 273 insns (linear in k) New (from_u64s): k=3 → 68 insns, k=8 → 68 insns (constant in k) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove EncodeDecode trait and Sequence encoding, rename module to indexed Indexed is now the sole encoding format with inherent methods, so callers don't need to import a trait. The Sequence format provided no random access or u64-aligned decoding and is no longer needed. Renames serialization_neu to indexed now that there is no other serialization module to distinguish from. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move validate_typed to FromBytes::validate, remove Indexed struct validate_typed was partly a function of the Indexed format and partly of the type being decoded. It now lives as FromBytes::validate, which combines structural and type-compatibility checks using element_sizes. The Indexed struct's methods were all one-line delegates to free functions in the indexed module. Removed the struct and inlined length_in_words/length_in_bytes as free functions. Callers use the module directly (columnar::bytes::indexed::encode, etc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Hide element_sizes as implementation detail of FromBytes::validate element_sizes is only used internally by validate. Mark it #[doc(hidden)] and simplify tests to exercise validate directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Make both element_sizes and validate public on FromBytes element_sizes is public for implementors to override. validate is public for callers to use at trust boundaries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Clean up decode_asm example to three representative approaches Trimmed from experimental accumulation to a clean comparison of: - from_bytes + decode (O(k) baseline) - from_u64s + decode_u64s (O(1) in k via dead code elimination) - decode_field random access (O(1) in both k and field position) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update benchmarks to use indexed module instead of removed Sequence The bench and serde benchmarks referenced the removed EncodeDecode trait and Sequence type. Updated to use the indexed module directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix from_u64s for Discriminant and updated enum container layout The enum container struct now uses a single `indexes: Discriminant` field instead of separate `variant` and `offset` fields. Update the derive macro's from_u64s to match, and add from_u64s/element_sizes to the Discriminant FromBytes impl. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rework validate to take decoded slices, add validate_typed entry point FromBytes::validate now takes &[(&[u64], u8)] matching the from_u64s input shape, making it composable for nested types. Added indexed::validate_typed::<T> as the single entry point that combines structural and type-level validation. Also added from_u64s and element_sizes for Discriminant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Rename validate to validate_structure, validate_typed to validate The obvious name should do the obvious thing: indexed::validate::<T> does full validation (structural + type compatibility). The structural- only check is now validate_structure, an implementation detail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove unused FromBytes import in test module Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 432aa95 commit 8f254ac

File tree

15 files changed

+699
-159
lines changed

15 files changed

+699
-159
lines changed

benches/bench.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use bencher::{benchmark_group, benchmark_main, Bencher};
22
use columnar::{Clear, Columnar};
3-
use columnar::bytes::{EncodeDecode, Sequence};
3+
use columnar::bytes::indexed;
44

55
fn empty_copy(bencher: &mut Bencher) { _bench_copy(bencher, vec![(); 1024]); }
66
fn option_copy(bencher: &mut Bencher) { _bench_copy(bencher, vec![Option::<String>::None; 1024]); }
@@ -61,7 +61,7 @@ fn _bench_copy<T: Columnar+Eq>(bencher: &mut Bencher, record: T) where T::Contai
6161
arena.push(&record);
6262
}
6363
use columnar::Borrow;
64-
bencher.bytes = Sequence::length_in_bytes(&arena.borrow()) as u64;
64+
bencher.bytes = indexed::length_in_bytes(&arena.borrow()) as u64;
6565
arena.clear();
6666

6767
bencher.iter(|| {
@@ -83,7 +83,7 @@ fn _bench_extend<T: Columnar+Eq>(bencher: &mut Bencher, record: T) where T::Cont
8383
arena.push(&record);
8484
}
8585
use columnar::{Borrow, Container};
86-
bencher.bytes = Sequence::length_in_bytes(&arena.borrow()) as u64;
86+
bencher.bytes = indexed::length_in_bytes(&arena.borrow()) as u64;
8787

8888
let arena2 = arena.clone();
8989

benches/serde.rs

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
use bencher::{benchmark_group, benchmark_main, Bencher};
22
use columnar::{Columnar, Container, Clear, FromBytes};
3-
use columnar::bytes::{EncodeDecode, Sequence};
3+
use columnar::bytes::indexed;
44
use serde::{Serialize, Deserialize};
55

66
fn goser_new(b: &mut Bencher) {
@@ -19,7 +19,7 @@ fn goser_push(b: &mut Bencher) {
1919
container.push(&log);
2020
}
2121
let mut words = vec![];
22-
Sequence::encode(&mut words, &container.borrow());
22+
indexed::encode(&mut words, &container.borrow());
2323
b.bytes = 8 * words.len() as u64;
2424
b.iter(|| {
2525
container.clear();
@@ -50,11 +50,11 @@ fn goser_encode(b: &mut Bencher) {
5050
container.push(&log);
5151
}
5252
let mut words = vec![];
53-
Sequence::encode(&mut words, &container.borrow());
53+
indexed::encode(&mut words, &container.borrow());
5454
b.bytes = 8 * words.len() as u64;
5555
b.iter(|| {
5656
words.clear();
57-
Sequence::encode(&mut words, &container.borrow());
57+
indexed::encode(&mut words, &container.borrow());
5858
bencher::black_box(&words);
5959
});
6060
}
@@ -67,10 +67,10 @@ fn goser_decode(b: &mut Bencher) {
6767
for _ in 0..1024 {
6868
container.push(&log);
6969
}
70-
Sequence::encode(&mut words, &container.borrow());
70+
indexed::encode(&mut words, &container.borrow());
7171
b.bytes = 8 * words.len() as u64;
7272
b.iter(|| {
73-
let mut slices = Sequence::decode(&mut words);
73+
let mut slices = indexed::decode(&mut words);
7474
let foo = <<Log as Columnar>::Container as Container>::Borrowed::from_bytes(&mut slices);
7575
bencher::black_box(foo);
7676
});

columnar_derive/src/lib.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,10 @@ fn derive_struct(name: &syn::Ident, generics: &syn::Generics, data_struct: syn::
331331
)*
332332
Self { #(#names,)* }
333333
}
334+
#[inline(always)]
335+
fn from_u64s(words: &mut impl Iterator<Item=(&'columnar [u64], u8)>) -> Self {
336+
Self { #(#names: ::columnar::FromBytes::from_u64s(words),)* }
337+
}
334338
}
335339
}
336340
};
@@ -519,6 +523,11 @@ fn derive_unit_struct(name: &syn::Ident, _generics: &syn::Generics, vis: syn::Vi
519523
fn from_byte_slices(bytes: &[&'columnar [u8]]) -> Self {
520524
Self { count: &::columnar::bytemuck::try_cast_slice(bytes[0]).unwrap()[0] }
521525
}
526+
#[inline(always)]
527+
fn from_u64s(words: &mut impl Iterator<Item=(&'columnar [u64], u8)>) -> Self {
528+
let (w, _tail) = words.next().expect("Iterator exhausted prematurely");
529+
Self { count: &w[0] }
530+
}
522531
}
523532

524533
impl ::columnar::Columnar for #name {
@@ -910,6 +919,13 @@ fn derive_enum(name: &syn::Ident, generics: &syn:: Generics, data_enum: syn::Dat
910919
let indexes = <::columnar::Discriminant<CVar, COff, CC>>::from_byte_slices(&bytes[_offset ..]);
911920
Self { #(#names,)* indexes }
912921
}
922+
#[inline(always)]
923+
fn from_u64s(words: &mut impl Iterator<Item=(&'columnar [u64], u8)>) -> Self {
924+
Self {
925+
#(#names: ::columnar::FromBytes::from_u64s(words),)*
926+
indexes: ::columnar::FromBytes::from_u64s(words),
927+
}
928+
}
913929
}
914930
}
915931
};
@@ -1203,6 +1219,10 @@ fn derive_tags(name: &syn::Ident, _generics: &syn:: Generics, data_enum: syn::Da
12031219
fn from_byte_slices(bytes: &[&'columnar [u8]]) -> Self {
12041220
Self { variant: CVar::from_byte_slices(bytes) }
12051221
}
1222+
#[inline(always)]
1223+
fn from_u64s(words: &mut impl Iterator<Item=(&'columnar [u64], u8)>) -> Self {
1224+
Self { variant: ::columnar::FromBytes::from_u64s(words) }
1225+
}
12061226
}
12071227

12081228
impl ::columnar::Columnar for #name {

examples/decode_asm.rs

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
//! Assembly inspection for decode paths.
2+
//!
3+
//! Compares three approaches to accessing a single field of a k-tuple
4+
//! stored in Indexed-encoded `&[u64]` data:
5+
//!
6+
//! 1. `from_bytes` + `decode`: constructs all k fields, O(k)
7+
//! 2. `from_u64s` + `decode_u64s`: non-panicking, LLVM eliminates unused fields, O(1) in k
8+
//! 3. `decode_field` (random access): decodes one field directly, O(1) in k and j
9+
//!
10+
//! Build with: `cargo rustc --example decode_asm --release -- --emit asm`
11+
12+
use columnar::*;
13+
use columnar::bytes::indexed;
14+
15+
// ================================================================
16+
// from_bytes path (construct all k fields, access field j)
17+
// ================================================================
18+
19+
#[no_mangle] pub fn bytes_3_f0(store: &[u64], i: usize) -> u64 {
20+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64]);
21+
T::from_bytes(&mut indexed::decode(store)).0[i]
22+
}
23+
#[no_mangle] pub fn bytes_3_flast(store: &[u64], i: usize) -> u64 {
24+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64]);
25+
T::from_bytes(&mut indexed::decode(store)).2[i]
26+
}
27+
#[no_mangle] pub fn bytes_8_f0(store: &[u64], i: usize) -> u64 {
28+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64], &'a [u64],
29+
&'a [u64], &'a [u64], &'a [u64], &'a [u64]);
30+
T::from_bytes(&mut indexed::decode(store)).0[i]
31+
}
32+
#[no_mangle] pub fn bytes_8_flast(store: &[u64], i: usize) -> u64 {
33+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64], &'a [u64],
34+
&'a [u64], &'a [u64], &'a [u64], &'a [u64]);
35+
T::from_bytes(&mut indexed::decode(store)).7[i]
36+
}
37+
38+
// ================================================================
39+
// from_u64s path (non-panicking, LLVM eliminates unused fields)
40+
// ================================================================
41+
42+
#[no_mangle] pub fn u64s_3_f0(store: &[u64], i: usize) -> u64 {
43+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64]);
44+
T::from_u64s(&mut indexed::decode_u64s(store)).0[i]
45+
}
46+
#[no_mangle] pub fn u64s_3_flast(store: &[u64], i: usize) -> u64 {
47+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64]);
48+
T::from_u64s(&mut indexed::decode_u64s(store)).2[i]
49+
}
50+
#[no_mangle] pub fn u64s_8_f0(store: &[u64], i: usize) -> u64 {
51+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64], &'a [u64],
52+
&'a [u64], &'a [u64], &'a [u64], &'a [u64]);
53+
T::from_u64s(&mut indexed::decode_u64s(store)).0[i]
54+
}
55+
#[no_mangle] pub fn u64s_8_flast(store: &[u64], i: usize) -> u64 {
56+
type T<'a> = (&'a [u64], &'a [u64], &'a [u64], &'a [u64],
57+
&'a [u64], &'a [u64], &'a [u64], &'a [u64]);
58+
T::from_u64s(&mut indexed::decode_u64s(store)).7[i]
59+
}
60+
61+
// ================================================================
62+
// Random access (decode one field directly, O(1) in both k and j)
63+
// ================================================================
64+
65+
/// Decode field `k` directly from store as `(&[u64], u8)`.
66+
/// Each call is independent — no iterator state.
67+
#[inline(always)]
68+
fn decode_field(store: &[u64], k: usize) -> (&[u64], u8) {
69+
let slices = store[0] as usize / 8 - 1;
70+
let index = &store[..slices + 1];
71+
let last = *index.last().unwrap_or(&0) as usize;
72+
let last_w = (last + 7) / 8;
73+
let words = &store[..last_w];
74+
let upper = (*index.get(k + 1).unwrap_or(&0) as usize).min(last);
75+
let lower = (((*index.get(k).unwrap_or(&0) as usize) + 7) & !7).min(upper);
76+
let upper_w = ((upper + 7) / 8).min(words.len());
77+
let lower_w = (lower / 8).min(upper_w);
78+
let tail = (upper % 8) as u8;
79+
(&words[lower_w..upper_w], tail)
80+
}
81+
82+
#[no_mangle] pub fn field_3_f0(store: &[u64], i: usize) -> u64 {
83+
decode_field(store, 0).0[i]
84+
}
85+
#[no_mangle] pub fn field_3_flast(store: &[u64], i: usize) -> u64 {
86+
decode_field(store, 2).0[i]
87+
}
88+
#[no_mangle] pub fn field_8_f0(store: &[u64], i: usize) -> u64 {
89+
decode_field(store, 0).0[i]
90+
}
91+
#[no_mangle] pub fn field_8_flast(store: &[u64], i: usize) -> u64 {
92+
decode_field(store, 7).0[i]
93+
}
94+
95+
fn main() {
96+
let mut store = vec![0u64; 100];
97+
store[0] = 32; store[1] = 32; store[2] = 32; store[3] = 32;
98+
println!("{}", std::hint::black_box(field_3_f0(std::hint::black_box(&store), 0)));
99+
}

0 commit comments

Comments
 (0)