Commit b62080b
committed
Change the bench programs to atoms of size 2
Trying it out confirms ua-parser#31, and the better introspectivity of
FilteredRE2 explains why: turns out the data set has a pretty
small number of atoms of length 2 with high discriminatory power.
Lowering length to 2 increases the number of atoms from 1630 to just
1865 (+235, +14.4%) which explains why memory use is unaffected or
even goes down (some regexes which match none of the samples are
likely not even tried anymore) but performances
increase *dramatically* (48s -> 27s for re2, 38s -> 24s for regex).
This makes sense as devices are also where ua-parser#31 got extreme bang for
its buck.
It's a bit sad seeing re2 catch up so much with our hard work, but it
makes sense if we assume `regex` has a more optimised regex matching
at the cost of memory: with better discrimination we drastically
decrease the amount of regex matching, which benefits the package with
the slower regex matching.
Although to be fair the re2 bench could also be slower due to the use
of an `re2::Set` instead of an aho-corasick automaton. In fact that's
pretty likely.
However effect seems non-existent to slightly negative for UA and OS:
- At 3-atoms, UAs have 849 atoms for 362 regex, and both re2 and regex
run in about 10s (9.70~9.90 real), interestingly the RSS and memory
footprint of regex are a lot lower there (25MB to 32~33 footprint).
- At 2-atoms, UAs have 874 atoms for 362 regex, and both re2 and regex
run a bit slower, around 10.50 for re2 and 10.40 for regex, memory
use is the same.
- OS is basically inbetween, going from 3-atoms to 2-atoms the number
of atoms increases a small hair from 353 to 359 (for 201 regexes),
the re2 performances remain stable (8.15~8.40) while regex seems to
decrease a hair (from 7.10~7.20 to 7.60~7.70).
Note that this is all over 100 runs parsing 75158 user agents. But
that hints that maybe different configurations for the ua and device
parsers would make sense...
Fixes ua-parser#301 parent 5ff0c5e commit b62080b
2 files changed
+4
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
0 commit comments