Skip to content

Conversation

@masklinn
Copy link
Collaborator

Character classes were never unstacked so a bounded repetition following a class would never be rewritten.

This yields no time savings on the bench run, maximum RSS and peak memory do seem to go down by a few MiB (we're talking 129 to 126 or so) but it's not 100% reliable.

Character classes were never unstacked so a bounded repetition
following a class would never be rewritten.

This yields no time savings on the bench run, maximum RSS and peak
memory do seem to go down by a few MiB (we're talking 129 to 126 or
so) but it's not 100% reliable.
@masklinn masklinn enabled auto-merge (rebase) May 11, 2025 08:05
@masklinn masklinn merged commit 2d289cb into ua-parser:main May 11, 2025
16 checks passed
@masklinn masklinn deleted the rewrite-regex-2 branch May 11, 2025 08:09
@masklinn
Copy link
Collaborator Author

masklinn commented May 11, 2025

Update: this actually cuts down memory use by the bench significantly but I forgot to recompile the bench example between branch switches 🤦

Before this PR (on a 16" MBP, M1 Pro):

> /usr/bin/time -l ../target/release/examples/bench -r 10 ~/sources/thirdparty/uap-core/regexes.yaml ~/sources/thirdparty/uap-python/samples/useragents.txt
Lines: 751580
Total time: 8.322629583s
11µs / line
        8.75 real         8.49 user         0.03 sys
           136527872  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                8546  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  63  involuntary context switches
         77382503510  instructions retired
         27456330696  cycles elapsed
           132777216  peak memory footprint

after:

> /usr/bin/time -l ../target/release/examples/bench -r 10 ~/sources/thirdparty/uap-core/regexes.yaml ~/sources/thirdparty/uap-python/samples/useragents.txt
Lines: 751580
Total time: 8.119800417s
10µs / line
        8.48 real         8.20 user         0.03 sys
            57950208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3750  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  68  involuntary context switches
         74965288660  instructions retired
         26498238780  cycles elapsed
            54674752  peak memory footprint

So yeah, #2 should have cut the memory use by 75% or so, which probably puts us within a stone's throw of re2 really.

masklinn added a commit to masklinn/uap-python that referenced this pull request Jun 9, 2025
Given ua-parser/uap-rust#29 and ua-parser/uap-rust#31, the wording of
the comparison needs to be updated to account for:

- The `regex` memory use being much improved.
- The `regex` runtime on devices being slightly improved, with the
  Python interface to `re2` not supporting custom atom lengths.

Closes ua-parser#264
masklinn added a commit to ua-parser/uap-python that referenced this pull request Jun 15, 2025
Given ua-parser/uap-rust#29 and ua-parser/uap-rust#31, the wording of
the comparison needs to be updated to account for:

- The `regex` memory use being much improved.
- The `regex` runtime on devices being slightly improved, with the
  Python interface to `re2` not supporting custom atom lengths.

Closes #264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant