Skip to content

BE-364: Rework symbol representation for speed and size#8343

Open
indietyp wants to merge 21 commits intobm/be-360-hashql-rework-compiletestfrom
bm/be-364-hashql-rework-symbol-to-be-faster-and-smaller
Open

BE-364: Rework symbol representation for speed and size#8343
indietyp wants to merge 21 commits intobm/be-360-hashql-rework-compiletestfrom
bm/be-364-hashql-rework-symbol-to-be-faster-and-smaller

Conversation

@indietyp
Copy link
Member

@indietyp indietyp commented Feb 2, 2026

🌟 What is the purpose of this PR?

Reworks the HashQL symbol infrastructure to be faster and smaller by using a tagged pointer representation that distinguishes between compile-time constant symbols and runtime-interned symbols.

🔍 What does this change?

  • Introduces Repr, a tagged pointer representation for symbols:
    • Constant symbols: Index encoded in pointer bits, pointing into static SYMBOLS array ('static lifetime)
    • Runtime symbols: Pointer to bump-allocated RuntimeRepr with inline length + string data
  • Reworks SymbolTable from a generic ID→Symbol mapping to a string-interning hash table
  • Adds ConstantSymbol type for compile-time pattern matching against predefined symbols
  • Rewrites symbols! macro to generate:
    • SYMBOLS: Static slice for interner pre-population
    • Symbol constants with companion modules containing CONST: ConstantSymbol
    • LOOKUP: String→Repr mapping for fast runtime lookup
  • Adds SymbolLookup type for O(1) constant symbol detection during interning
  • Adds comprehensive benchmarks for symbol interning and lookup operations
  • Updates all symbol usages across HIR, MIR, and eval crates

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

None.

🛡 What tests cover this?

  • Unit tests in symbol/repr.rs, symbol/table.rs, symbol/sym.rs
  • Existing HIR/MIR tests exercise the updated symbol infrastructure

❓ How to test this?

  1. cargo nextest run --package hashql-core
  2. cargo bench --package hashql-core -- symbol to verify performance

📊 Benchmarks

Operation Improvement
Intern unique -12–30% (-34% with local lock)
Intern repeated +2–4% (-32% with local lock)
Mixed workload within noise
Constant access -40%
Equality (constant + runtime) -20%
Hashing (constant + runtime) -50%
Constant as_str +100%
Runtime as_str +40%
Type checker simulation -64%

Type system benchmarks: -10–0% speedup

@vercel
Copy link

vercel bot commented Feb 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hash Ready Ready Preview, Comment Feb 6, 2026 3:46pm
petrinaut Ready Ready Preview Feb 6, 2026 3:46pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
hashdotdesign Ignored Ignored Preview Feb 6, 2026 3:46pm
hashdotdesign-tokens Ignored Ignored Preview Feb 6, 2026 3:46pm

@cursor
Copy link

cursor bot commented Feb 2, 2026

PR Summary

High Risk
Core symbol interning/storage is redesigned (new tagged-pointer Repr, new SymbolTable/locking and reset ordering), which is pervasive across the compiler and involves unsafe code and lifetime invariants that could cause subtle correctness/UB issues if wrong.

Overview
Reworks HashQL’s symbol system to use a compact tagged-pointer Repr that encodes constant symbols as indices into a generated static SYMBOLS table and runtime symbols as pointers to bump-allocated RuntimeRepr data, enabling cheaper equality/hash and Option<Symbol> niche optimization.

Replaces the old heap interning mechanism (mutex + HashSet<&'static str> with a fake 'static lifetime) with a dedicated SymbolTable string interner guarded by LocalLock, including explicit prime()/reset() semantics and required reset ordering (symbol table reset before arena reset). Adds ConstantSymbol + Symbol::as_constant() for pattern matching, and introduces SymbolLookup to replace the previous ID→symbol mapping use of SymbolTable in HIR.

Updates symbol constants generation via a rewritten symbols! macro (new LOOKUP mapping and renamed/reshaped sym namespaces), and propagates the API/name changes across AST diagnostics, pretty-printers, eval/HIR/MIR code paths, and operator symbol mappings. Adds extensive unit tests, extends Miri coverage, and introduces a new symbol benchmark suite in hashql-core.

Written by Cursor Bugbot for commit a78848f. This will update automatically on new commits. Configure here.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

Benchmark results

@rust/hash-graph-benches – Integrations

policy_resolution_large

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2002 $$26.7 \mathrm{ms} \pm 281 \mathrm{μs}\left({\color{gray}1.13 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$3.21 \mathrm{ms} \pm 21.3 \mathrm{μs}\left({\color{gray}0.338 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1001 $$12.3 \mathrm{ms} \pm 106 \mathrm{μs}\left({\color{gray}4.16 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 3314 $$42.5 \mathrm{ms} \pm 364 \mathrm{μs}\left({\color{gray}2.43 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$14.7 \mathrm{ms} \pm 102 \mathrm{μs}\left({\color{red}8.48 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 1526 $$24.1 \mathrm{ms} \pm 127 \mathrm{μs}\left({\color{red}7.39 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 2078 $$43.1 \mathrm{ms} \pm 269 \mathrm{μs}\left({\color{gray}1.50 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$19.9 \mathrm{ms} \pm 174 \mathrm{μs}\left({\color{gray}2.04 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 1033 $$28.8 \mathrm{ms} \pm 236 \mathrm{μs}\left({\color{gray}3.52 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_medium

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 102 $$3.82 \mathrm{ms} \pm 23.3 \mathrm{μs}\left({\color{red}5.64 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.88 \mathrm{ms} \pm 14.2 \mathrm{μs}\left({\color{gray}3.03 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 51 $$3.36 \mathrm{ms} \pm 20.8 \mathrm{μs}\left({\color{red}5.92 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 269 $$5.44 \mathrm{ms} \pm 33.1 \mathrm{μs}\left({\color{red}9.60 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$3.63 \mathrm{ms} \pm 27.1 \mathrm{μs}\left({\color{red}8.58 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 107 $$4.14 \mathrm{ms} \pm 26.0 \mathrm{μs}\left({\color{red}6.36 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 133 $$4.58 \mathrm{ms} \pm 37.3 \mathrm{μs}\left({\color{red}7.22 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$3.41 \mathrm{ms} \pm 20.3 \mathrm{μs}\left({\color{red}6.26 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 63 $$4.36 \mathrm{ms} \pm 30.4 \mathrm{μs}\left({\color{red}14.5 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_none

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 2 $$2.37 \mathrm{ms} \pm 10.4 \mathrm{μs}\left({\color{gray}1.26 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.31 \mathrm{ms} \pm 10.1 \mathrm{μs}\left({\color{gray}-0.153 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 1 $$2.39 \mathrm{ms} \pm 9.48 \mathrm{μs}\left({\color{gray}-0.337 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 8 $$2.62 \mathrm{ms} \pm 12.1 \mathrm{μs}\left({\color{gray}1.41 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.51 \mathrm{ms} \pm 12.8 \mathrm{μs}\left({\color{gray}0.891 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 3 $$2.74 \mathrm{ms} \pm 18.4 \mathrm{μs}\left({\color{gray}3.14 \mathrm{\%}}\right) $$ Flame Graph

policy_resolution_small

Function Value Mean Flame graphs
resolve_policies_for_actor user: empty, selectivity: high, policies: 52 $$2.76 \mathrm{ms} \pm 15.8 \mathrm{μs}\left({\color{gray}-0.212 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: low, policies: 1 $$2.40 \mathrm{ms} \pm 11.1 \mathrm{μs}\left({\color{gray}0.330 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: empty, selectivity: medium, policies: 25 $$2.58 \mathrm{ms} \pm 15.6 \mathrm{μs}\left({\color{gray}1.58 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: high, policies: 94 $$3.09 \mathrm{ms} \pm 18.6 \mathrm{μs}\left({\color{gray}0.923 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: low, policies: 1 $$2.72 \mathrm{ms} \pm 16.7 \mathrm{μs}\left({\color{gray}4.13 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: seeded, selectivity: medium, policies: 26 $$2.93 \mathrm{ms} \pm 20.2 \mathrm{μs}\left({\color{gray}3.41 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: high, policies: 66 $$2.99 \mathrm{ms} \pm 15.8 \mathrm{μs}\left({\color{gray}-0.084 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: low, policies: 1 $$2.63 \mathrm{ms} \pm 12.0 \mathrm{μs}\left({\color{gray}0.627 \mathrm{\%}}\right) $$ Flame Graph
resolve_policies_for_actor user: system, selectivity: medium, policies: 29 $$2.86 \mathrm{ms} \pm 11.8 \mathrm{μs}\left({\color{gray}0.279 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_complete

Function Value Mean Flame graphs
entity_by_id;one_depth 1 entities $$39.0 \mathrm{ms} \pm 178 \mathrm{μs}\left({\color{gray}2.02 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 10 entities $$76.7 \mathrm{ms} \pm 446 \mathrm{μs}\left({\color{gray}2.91 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 25 entities $$43.4 \mathrm{ms} \pm 207 \mathrm{μs}\left({\color{gray}4.12 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 5 entities $$45.6 \mathrm{ms} \pm 285 \mathrm{μs}\left({\color{gray}2.05 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;one_depth 50 entities $$54.2 \mathrm{ms} \pm 349 \mathrm{μs}\left({\color{gray}3.01 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 1 entities $$40.5 \mathrm{ms} \pm 171 \mathrm{μs}\left({\color{gray}1.62 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 10 entities $$420 \mathrm{ms} \pm 1.08 \mathrm{ms}\left({\color{gray}0.253 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 25 entities $$97.9 \mathrm{ms} \pm 389 \mathrm{μs}\left({\color{red}8.42 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 5 entities $$85.2 \mathrm{ms} \pm 270 \mathrm{μs}\left({\color{gray}-0.015 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;two_depth 50 entities $$289 \mathrm{ms} \pm 811 \mathrm{μs}\left({\color{gray}3.09 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 1 entities $$15.1 \mathrm{ms} \pm 62.5 \mathrm{μs}\left({\color{gray}3.25 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 10 entities $$14.9 \mathrm{ms} \pm 75.1 \mathrm{μs}\left({\color{gray}0.969 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 25 entities $$15.4 \mathrm{ms} \pm 80.7 \mathrm{μs}\left({\color{gray}-3.390 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 5 entities $$14.9 \mathrm{ms} \pm 69.7 \mathrm{μs}\left({\color{gray}1.33 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id;zero_depth 50 entities $$18.1 \mathrm{ms} \pm 135 \mathrm{μs}\left({\color{gray}0.998 \mathrm{\%}}\right) $$ Flame Graph

read_scaling_linkless

Function Value Mean Flame graphs
entity_by_id 1 entities $$14.6 \mathrm{ms} \pm 71.4 \mathrm{μs}\left({\color{gray}-1.361 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10 entities $$14.9 \mathrm{ms} \pm 67.3 \mathrm{μs}\left({\color{gray}-1.293 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 100 entities $$15.1 \mathrm{ms} \pm 112 \mathrm{μs}\left({\color{gray}-0.408 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 1000 entities $$15.1 \mathrm{ms} \pm 60.9 \mathrm{μs}\left({\color{gray}-4.534 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id 10000 entities $$22.6 \mathrm{ms} \pm 133 \mathrm{μs}\left({\color{gray}-0.865 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity

Function Value Mean Flame graphs
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$29.9 \mathrm{ms} \pm 309 \mathrm{μs}\left({\color{gray}-2.825 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$31.2 \mathrm{ms} \pm 335 \mathrm{μs}\left({\color{gray}1.73 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$30.8 \mathrm{ms} \pm 309 \mathrm{μs}\left({\color{gray}-3.973 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$29.1 \mathrm{ms} \pm 272 \mathrm{μs}\left({\color{gray}-4.906 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$30.9 \mathrm{ms} \pm 336 \mathrm{μs}\left({\color{gray}-0.030 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$30.1 \mathrm{ms} \pm 238 \mathrm{μs}\left({\color{gray}-2.461 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$30.4 \mathrm{ms} \pm 300 \mathrm{μs}\left({\color{gray}-1.518 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$30.2 \mathrm{ms} \pm 311 \mathrm{μs}\left({\color{gray}-2.460 \mathrm{\%}}\right) $$ Flame Graph
entity_by_id entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$31.3 \mathrm{ms} \pm 272 \mathrm{μs}\left({\color{gray}0.985 \mathrm{\%}}\right) $$ Flame Graph

representative_read_entity_type

Function Value Mean Flame graphs
get_entity_type_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba $$8.19 \mathrm{ms} \pm 61.5 \mathrm{μs}\left({\color{gray}-2.894 \mathrm{\%}}\right) $$ Flame Graph

representative_read_multiple_entities

Function Value Mean Flame graphs
entity_by_property traversal_paths=0 0 $$47.2 \mathrm{ms} \pm 223 \mathrm{μs}\left({\color{gray}-3.047 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$92.0 \mathrm{ms} \pm 402 \mathrm{μs}\left({\color{gray}-3.068 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$53.4 \mathrm{ms} \pm 280 \mathrm{μs}\left({\color{gray}-3.435 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$58.9 \mathrm{ms} \pm 261 \mathrm{μs}\left({\color{lightgreen}-6.827 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$67.8 \mathrm{ms} \pm 328 \mathrm{μs}\left({\color{lightgreen}-6.341 \mathrm{\%}}\right) $$
entity_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$74.2 \mathrm{ms} \pm 330 \mathrm{μs}\left({\color{gray}-3.677 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=0 0 $$47.5 \mathrm{ms} \pm 256 \mathrm{μs}\left({\color{lightgreen}-8.034 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=255 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true $$76.0 \mathrm{ms} \pm 366 \mathrm{μs}\left({\color{gray}-0.351 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false $$54.5 \mathrm{ms} \pm 433 \mathrm{μs}\left({\color{lightgreen}-8.479 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true $$61.8 \mathrm{ms} \pm 396 \mathrm{μs}\left({\color{lightgreen}-6.061 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true $$64.9 \mathrm{ms} \pm 488 \mathrm{μs}\left({\color{lightgreen}-5.323 \mathrm{\%}}\right) $$
link_by_source_by_property traversal_paths=2 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true $$66.0 \mathrm{ms} \pm 355 \mathrm{μs}\left({\color{gray}-2.168 \mathrm{\%}}\right) $$

scenarios

Function Value Mean Flame graphs
full_test query-limited $$142 \mathrm{ms} \pm 729 \mathrm{μs}\left({\color{gray}3.21 \mathrm{\%}}\right) $$ Flame Graph
full_test query-unlimited $$140 \mathrm{ms} \pm 643 \mathrm{μs}\left({\color{gray}2.15 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-limited $$104 \mathrm{ms} \pm 422 \mathrm{μs}\left({\color{gray}-0.876 \mathrm{\%}}\right) $$ Flame Graph
linked_queries query-unlimited $$600 \mathrm{ms} \pm 3.55 \mathrm{ms}\left({\color{gray}0.149 \mathrm{\%}}\right) $$ Flame Graph

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team

Development

Successfully merging this pull request may close these issues.

2 participants