Skip to content

Conversation

@sidd-27
Copy link
Contributor

@sidd-27 sidd-27 commented Jan 5, 2026

Structural Changes

  • Old Layout (AoS):
    • Used a single Vec<QuickDecodeEntry> where each entry contained rcode (8 bytes), id (2 bytes), hamming (1 byte), and rotation (1 byte).
    • Due to alignment rules, 4 bytes of padding were added to every entry to align the usize, resulting in 16 bytes per slot.
  • New Layout (SoA):
    • codes: Vec<usize>: Stores only the raw keys. This is the only data accessed during the hot linear probing loop.
    • entries: Vec<PackedEntry>: A parallel vector storing id and hamming.
    • PackedEntry: Defined as #[repr(packed)], reducing the payload size to exactly 3 bytes (2 bytes for id + 1 byte for hamming) with zero padding.

Performance & Memory Impact

1. Memory Reduction

We reduce storage per slot from 16 bytes (aligned) to 11 bytes (8 byte key + 3 byte packed entry).

Example: TagStandard52h13
For the 52h13 family (52 bits, 48,714 codes), the table generates precomputed Hamming neighbors for 0, 1, and 2-bit errors.

  • Total Entries: ~67.2 million
  • Table Capacity: ~201.5 million slots (3x load factor)
  • Old Size (AoS): ~3.22 GB
  • New Size (SoA): ~2.22 GB
  • Total Saved: ~1.00 GB RAM

2. Cache Locality

Linear probing now iterates over contiguous usize keys in the codes vector. A 64-byte cache line now holds 8 keys (previously 4), effectively doubling search throughput during collision resolution.

@sidd-27 sidd-27 closed this Jan 6, 2026
@sidd-27 sidd-27 deleted the refactor-quickdecode branch January 8, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant