refactor: reduce QuickDecode memory usage by 31.5% #418
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Structural Changes
Vec<QuickDecodeEntry>where each entry containedrcode(8 bytes),id(2 bytes),hamming(1 byte), androtation(1 byte).usize, resulting in 16 bytes per slot.codes: Vec<usize>: Stores only the raw keys. This is the only data accessed during the hot linear probing loop.entries: Vec<PackedEntry>: A parallel vector storingidandhamming.PackedEntry: Defined as#[repr(packed)], reducing the payload size to exactly 3 bytes (2 bytes forid+ 1 byte forhamming) with zero padding.Performance & Memory Impact
1. Memory Reduction
We reduce storage per slot from 16 bytes (aligned) to 11 bytes (8 byte key + 3 byte packed entry).
Example:
TagStandard52h13For the
52h13family (52 bits, 48,714 codes), the table generates precomputed Hamming neighbors for 0, 1, and 2-bit errors.2. Cache Locality
Linear probing now iterates over contiguous
usizekeys in thecodesvector. A 64-byte cache line now holds 8 keys (previously 4), effectively doubling search throughput during collision resolution.