Skip to content

Add thread-local string interner for ReString#4540

Draft
AhmedSoliman wants to merge 1 commit intomainfrom
pr4540
Draft

Add thread-local string interner for ReString#4540
AhmedSoliman wants to merge 1 commit intomainfrom
pr4540

Conversation

@AhmedSoliman
Copy link
Copy Markdown
Contributor

Introduces InternedReString, a wrapper around ReString that deduplicates
long strings via a per-thread HashSet. Strings longer than size_of::()
are stored as Arc and looked up on construction and during deserialization
(serde and bilrost), so repeated values share a single heap allocation.

Short strings are stored inline and bypass the interner entirely.

The bilrost decode path is optimized with a fast path for contiguous buffers:
when chunk() covers the full string, we read directly as &str and query the
interner without allocating a temporary buffer.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

Test Results

  7 files  ±0    7 suites  ±0   2m 39s ⏱️ -1s
 47 tests ±0   47 ✅ ±0  0 💤 ±0  0 ❌ ±0 
200 runs  ±0  200 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 1d5d0f8. ± Comparison against base commit e737496.

♻️ This comment has been updated with latest results.

Introduces InternedReString, a wrapper around ReString that deduplicates
long strings via a per-thread HashSet. Strings longer than size_of::<String>()
are stored as Arc<str> and looked up on construction and during deserialization
(serde and bilrost), so repeated values share a single heap allocation.

Short strings are stored inline and bypass the interner entirely.

The bilrost decode path is optimized with a fast path for contiguous buffers:
when chunk() covers the full string, we read directly as &str and query the
interner without allocating a temporary buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant