Skip to content

Conversation

@masklinn
Copy link
Collaborator

As it turns out a significant number of regexes have distinguishing atoms of length 2 rather than 3, leading to significant under-performing prefiltering using default settings e.g. when parsing sample 9997 (sort -u of sample file), the default setting prefilter from 633 to 61 regexes, of which the matching regex is number 50, leading to a lot of Regex::is_match.

Looking at the "extra" regexes, while they do have pretty long atoms those tend to be optional, the only required atoms are very short. By reducing the atom length to 2, the prefiltered set goes down to 20, of which the regex we're looking for is 14th. This cuts down the post-prefiltering filtering from 6µs to 2 (in addition to a 2µs prefiltering but that doesn't change much, it goes from 2.2 to 2.3).

This leads to a 15% perf increase on the benchmark, at no visible memory cost (maximum RSS and peak footprint are lost in noise), before:

Lines: 751580
Total time: 8.139572291s
10µs / line
        8.25 real         8.21 user         0.03 sys
            57655296  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3732  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  85  involuntary context switches
         74982477832  instructions retired
         26557964231  cycles elapsed
            54461952  peak memory footprint

after:

Lines: 751580
Total time: 6.797529459s
9µs / line
        6.91 real         6.86 user         0.04 sys
            57802752  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3741  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 154  involuntary context switches
         65652792138  instructions retired
         22207284899  cycles elapsed
            54478080  peak memory footprint

As it turns out a *significant* number of regexes have distinguishing
atoms of length 2 rather than 3, leading to significant
under-performing prefiltering using default settings e.g. when parsing
sample 9997 (`sort -u` of sample file), the default setting prefilter
from 633 to 61 regexes, of which the matching regex is number 50,
leading to a lot of `Regex::is_match`.

Looking at the "extra" regexes, while they do have pretty long atoms
those tend to be optional, the only required atoms are very short. By
reducing the atom length to 2, the prefiltered set goes down to 20, of
which the regex we're looking for is 14th. This cuts down the
post-prefiltering filtering from 6µs to 2 (in addition to a 2µs
prefiltering but that doesn't change much, it goes from 2.2 to 2.3).

This leads to a 15% perf increase on the benchmark, at no visible
memory cost (maximum RSS and peak footprint are lost in noise),
before:

Lines: 751580
Total time: 8.139572291s
10µs / line
        8.25 real         8.21 user         0.03 sys
            57655296  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3732  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  85  involuntary context switches
         74982477832  instructions retired
         26557964231  cycles elapsed
            54461952  peak memory footprint

after:

Lines: 751580
Total time: 6.797529459s
9µs / line
        6.91 real         6.86 user         0.04 sys
            57802752  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3741  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 154  involuntary context switches
         65652792138  instructions retired
         22207284899  cycles elapsed
            54478080  peak memory footprint
@masklinn masklinn enabled auto-merge (rebase) May 11, 2025 09:56
@masklinn masklinn merged commit 18fab27 into ua-parser:main May 11, 2025
16 checks passed
@masklinn masklinn deleted the reduce-atom-length branch May 11, 2025 09:57
masklinn added a commit to masklinn/uap-rust that referenced this pull request May 11, 2025
Trying it out confirms ua-parser#31, and the better introspectivity of
FilteredRE2 explains why: turns out the data set has a pretty
small number of atoms of length 2 with high discriminatory power.

Lowering length to 2 increases the number of atoms from 1630 to just
1865 (+235, +14.4%) which explains why memory use is unaffected or
even goes down (some regexes which match none of the samples are
likely not even tried anymore) but performances
increase *dramatically* (48s -> 27s for re2, 38s -> 24s for regex).
This makes sense as devices are also where ua-parser#31 got extreme bang for
its buck.

It's a bit sad seeing re2 catch up so much with our hard work, but it
makes sense if we assume `regex` has a more optimised regex matching
at the cost of memory: with better discrimination we drastically
decrease the amount of regex matching, which benefits the package with
the slower regex matching.

Although to be fair the re2 bench could also be slower due to the use
of an `re2::Set` instead of an aho-corasick automaton. In fact that's
pretty likely.

However effect seems non-existent to slightly negative for UA and OS:

- At 3-atoms, UAs have 849 atoms for 362 regex, and both re2 and regex
  run in about 10s (9.70~9.90 real), interestingly the RSS and memory
  footprint of regex are a lot lower there (25MB to 32~33 footprint).
- At 2-atoms, UAs have 874 atoms for 362 regex, and both re2 and regex
  run a bit slower, around 10.50 for re2 and 10.40 for regex, memory
  use is the same.
- OS is basically inbetween, going from 3-atoms to 2-atoms the number
  of atoms increases a small hair from 353 to 359 (for 201 regexes),
  the re2 performances remain stable (8.15~8.40) while regex seems to
  decrease a hair (from 7.10~7.20 to 7.60~7.70).

Note that this is all over 100 runs parsing 75158 user agents. But
that hints that maybe different configurations for the ua and device
parsers would make sense...

Fixes ua-parser#30
masklinn added a commit to masklinn/uap-rust that referenced this pull request May 11, 2025
Trying it out confirms ua-parser#31, and the better introspectivity of
FilteredRE2 explains why: turns out the data set has a pretty
small number of atoms of length 2 with high discriminatory power.

Lowering length to 2 increases the number of atoms from 1630 to just
1865 (+235, +14.4%) which explains why memory use is unaffected or
even goes down (some regexes which match none of the samples are
likely not even tried anymore) but performances
increase *dramatically* (48s -> 27s for re2, 38s -> 24s for regex).
This makes sense as devices are also where ua-parser#31 got extreme bang for
its buck.

It's a bit sad seeing re2 catch up so much with our hard work, but it
makes sense if we assume `regex` has a more optimised regex matching
at the cost of memory: with better discrimination we drastically
decrease the amount of regex matching, which benefits the package with
the slower regex matching.

Although to be fair the re2 bench could also be slower due to the use
of an `re2::Set` instead of an aho-corasick automaton. In fact that's
pretty likely.

However effect seems non-existent to slightly negative for UA and OS:

- At 3-atoms, UAs have 849 atoms for 362 regex, and both re2 and regex
  run in about 10s (9.70~9.90 real), interestingly the RSS and memory
  footprint of regex are a lot lower there (25MB to 32~33 footprint).
- At 2-atoms, UAs have 874 atoms for 362 regex, and both re2 and regex
  run a bit slower, around 10.50 for re2 and 10.40 for regex, memory
  use is the same.
- OS is basically inbetween, going from 3-atoms to 2-atoms the number
  of atoms increases a small hair from 353 to 359 (for 201 regexes),
  the re2 performances remain stable (8.15~8.40) while regex seems to
  decrease a hair (from 7.10~7.20 to 7.60~7.70).

Note that this is all over 100 runs parsing 75158 user agents. But
that hints that maybe different configurations for the ua and device
parsers would make sense...

Fixes ua-parser#30
masklinn added a commit that referenced this pull request May 11, 2025
Trying it out confirms #31, and the better introspectivity of
FilteredRE2 explains why: turns out the data set has a pretty
small number of atoms of length 2 with high discriminatory power.

Lowering length to 2 increases the number of atoms from 1630 to just
1865 (+235, +14.4%) which explains why memory use is unaffected or
even goes down (some regexes which match none of the samples are
likely not even tried anymore) but performances
increase *dramatically* (48s -> 27s for re2, 38s -> 24s for regex).
This makes sense as devices are also where #31 got extreme bang for
its buck.

It's a bit sad seeing re2 catch up so much with our hard work, but it
makes sense if we assume `regex` has a more optimised regex matching
at the cost of memory: with better discrimination we drastically
decrease the amount of regex matching, which benefits the package with
the slower regex matching.

Although to be fair the re2 bench could also be slower due to the use
of an `re2::Set` instead of an aho-corasick automaton. In fact that's
pretty likely.

However effect seems non-existent to slightly negative for UA and OS:

- At 3-atoms, UAs have 849 atoms for 362 regex, and both re2 and regex
  run in about 10s (9.70~9.90 real), interestingly the RSS and memory
  footprint of regex are a lot lower there (25MB to 32~33 footprint).
- At 2-atoms, UAs have 874 atoms for 362 regex, and both re2 and regex
  run a bit slower, around 10.50 for re2 and 10.40 for regex, memory
  use is the same.
- OS is basically inbetween, going from 3-atoms to 2-atoms the number
  of atoms increases a small hair from 353 to 359 (for 201 regexes),
  the re2 performances remain stable (8.15~8.40) while regex seems to
  decrease a hair (from 7.10~7.20 to 7.60~7.70).

Note that this is all over 100 runs parsing 75158 user agents. But
that hints that maybe different configurations for the ua and device
parsers would make sense...

Fixes #30
masklinn added a commit to masklinn/uap-python that referenced this pull request Jun 9, 2025
Given ua-parser/uap-rust#29 and ua-parser/uap-rust#31, the wording of
the comparison needs to be updated to account for:

- The `regex` memory use being much improved.
- The `regex` runtime on devices being slightly improved, with the
  Python interface to `re2` not supporting custom atom lengths.

Closes ua-parser#264
masklinn added a commit to ua-parser/uap-python that referenced this pull request Jun 15, 2025
Given ua-parser/uap-rust#29 and ua-parser/uap-rust#31, the wording of
the comparison needs to be updated to account for:

- The `regex` memory use being much improved.
- The `regex` runtime on devices being slightly improved, with the
  Python interface to `re2` not supporting custom atom lengths.

Closes #264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant