compare trie with packtab #4

taj-p · 2025-11-18T22:05:13Z

This PR tries to understand the differences in performance and binary size between ICU4X's CodePointTrie and PackTab generated code.

Results

Binary Size (🏆 PackTab - 56 kB cheaper)

Details:

The PackTab raw data is ~40 kB, the .postcard ICU4X trie is ~70 kB. I think the 16 kB difference in binary size for the binaries is due to ICU4X pulling in more of the std library (which should be absorbed by any compellingly complex consumer).

Lookup (🏆 PackTab - ~37% faster)

Lookup w/ unsafe PackTab: (🏆 PackTab - ~64% faster)

We also tested against an unsafe variant (harfbuzz/packtab#6) and produced even better results.

NOTE: Results may vary with different lookup ranges. This benchmark was fairly simple.

Steps to reprod

Checkout branch
Set this line to lookup::checksum_trie(samples, composite)

parley/parley_bench/src/benches.rs

Line 140 in 8dbcb14

black_box(lookup::checksum_packtab(samples));

Export the bench for use by tango cd parley_bench && cargo export target/benchmarks -- bench --bench=main
Revert step 2
Compare packtab with trie: cargo bench -q --bench=main -- compare target/benchmarks/main

NOTE: The current commit uses the unsafe PackTab variant. I think we would use this because of its significant improved performance over performing the bounds checking.

taj-p · 2025-11-18T22:05:36Z

parley_bench/src/bin/composite_packtab.rs

These binaries were used to compare sizes

taj-p · 2025-11-18T22:05:55Z

parley_bench/src/benches.rs

+
+    vec![benchmark_fn("Composite lookup", move |b| {
+        b.iter(|| {
+            black_box(lookup::checksum_packtab(samples));


Change this to checksum_trie to compare performance with tango

taj-p · 2025-11-18T22:06:15Z

unicode_data/src/lib.rs

+    use icu_provider::{DataMarker, DataRequest, DynamicDataProvider};
+
+    #[test]
+    fn packtab_matches_trie() {


Test to ensure that both packtab and trie return the same values

taj-p · 2025-11-18T22:08:01Z

composite_props.json

This is the raw data fed to PackTab

behdad · 2025-11-19T05:27:43Z

I studied the ICU CodePointTrie (aka UCPTrie) a bit at:

https://unicode-org.github.io/icu/design/struct/utrie#ucptrie--codepointtrie

My observations about how the two designs compare:

CodePointTrie has a direct-access array for ASCII, then a "faster" path for BMP, then a fallback path over all codepoints. If the faster speed of lower codepoints is desired similarly, packTab can be run three times with the truncated parts of the codepoint space, and optimized more aggressively in each, and the results overlaid. However,
I think one of the slow points of the CodePointTrie in your testing is that most codepoints fall in the slow-path of the CPTrie and behind two conditionals, whereas the entire packTab code is branchfree.
Finally, the packTab table sizes are optimized for size. Smaller size tables also interact with the CPU caches better, making the branchless code quite fast.

I think, ideally, the packTab algorithm should be contributed to ICU and possibly replace the UCPTrie implementation. The builder then can run the packTab algorithm and store optimal table. One reason this has not been pursued by ICU team might be that for this to be fast, you need to compile your final expression with a compiler. Ie. you want the shifts and masks values to be known at compile time. So we can't really store and load the shape of the partition and have lookup tables for this shape from provided data. That is why the UCPTrie partition shapes are fixed by the design. Because supporting arbitrary partitions is significantly slower than code for a fixed partition.

That is, we have to compile the data through the compiler to get performant code, and that's not possible in a code vs data model cleanly. packTab solves this by being a code generator. Since UCPTrie can also be used for code-generation (as in the case at Parley), then there should be an ICU builder that generates code that needs to be compiled to access the data tables. In other words: for each data table, if you are compiling it into code anyway, we might as well generate the optimial function code to access this data.

Another way to look at it is that UCPTrie doesn't make use of the fact that the table access code can be optimized by the compiler based on this specific data table, whereas packTab does. It would be interesting to see a Java implementation that can use the JRE's compiler to compile a table access function code from UCPTrie-builder. :D

My point is, I think I understand why UCPTrie performs so badly, and yes, for the case of code-generation, packTab code can be faster because 1. it is based on the optimal partition, which minimizes the data memory, which becomes more cache-friendly, 2. branch-free arithmetic operations of variable and a constant, translating to one instruction each.

Excuse my thinking aloud.

taj-p · 2025-11-19T22:00:51Z

behdad

@sffc - I'm curious about your thoughts here and whether a change like this would be accepted by ICU4X (and how that might work). (Also happy to schedule a call to chat amongst ourselves sometime in the several weeks).

You can read more about PackTab at these links:

http://github.com/harfbuzz/packtab
https://docs.google.com/document/d/1Xq3owVt61HVkJqbLFHl73il6pcTy6PdPJJ7bSouQiQw/preview

TLDR: PackTab generates code that finds a solution to minimise binary size (fully bitpacked code):

# Example: Unicode character categories with repeated patterns
# Values are all multiples of 5 in range [100, 135]
data = [
    100, 105, 110, 115, 120, 125, 130, 135,  # 0-7
    100, 105, 110, 115, 120, 125, 130, 135,  # 8-15 (repeat)
    105, 105, 105, 105, 120, 120, 120, 120,  # 16-23 (patterns)
    100, 100, 135, 135, 115, 115, 125, 125,  # 24-31 (patterns)
    110, 110, 110, 110, 110, 110, 110, 110,  # 32-39 (all same)
]

// PackTab generated code:

static category_u8: [u8; 20]=
[
   16, 50, 84,118, 16, 50, 84,118, 17, 17, 68, 68,  0,119, 51, 85,
   34, 34, 34, 34,
];

fn category_b4 (a: &[u8], i: usize) -> u8
{
  (a[i>>1]>>((i&1)<<2))&15
}
pub(crate) fn category_get (u: usize) -> u8
{
  if u<40 { 100+5*category_b4(&category_u8,(u) as usize) } else { 100 }
}

// What packTab Discovered:
// All values are multiples of 5 (100, 105, 110, 115, ...)
// All values ≥ 100 (minimum is 100)
 
// Formula: original_value = 100 + 5 * stored_value

compare trie with packtab

7dd778a

taj-p commented Nov 18, 2025

View reviewed changes

parley_bench/src/bin/composite_packtab.rs

Copy link

Owner Author

taj-p Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These binaries were used to compare sizes

taj-p commented Nov 18, 2025

View reviewed changes

composite_props.json

Copy link

Owner Author

taj-p Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the raw data fed to PackTab

unsafe

8dbcb14

taj-p mentioned this pull request Dec 3, 2025

Bake composite properties data linebender/parley#473

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare trie with packtab #4

compare trie with packtab #4

Uh oh!

taj-p commented Nov 18, 2025 •

edited

Loading

Uh oh!

taj-p Nov 18, 2025

Uh oh!

taj-p Nov 18, 2025

Uh oh!

taj-p Nov 18, 2025

Uh oh!

taj-p Nov 18, 2025

Uh oh!

behdad commented Nov 19, 2025 •

edited

Loading

Uh oh!

taj-p commented Nov 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

compare trie with packtab #4

Are you sure you want to change the base?

compare trie with packtab #4

Uh oh!

Conversation

taj-p commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Binary Size (🏆 PackTab - 56 kB cheaper)

Lookup (🏆 PackTab - ~37% faster)

Lookup w/ unsafe PackTab: (🏆 PackTab - ~64% faster)

Steps to reprod

Uh oh!

taj-p Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

taj-p Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

taj-p Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

taj-p Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

behdad commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taj-p commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

taj-p commented Nov 18, 2025 •

edited

Loading

behdad commented Nov 19, 2025 •

edited

Loading

taj-p commented Nov 19, 2025 •

edited

Loading