Skip to content

Commit f57c517

Browse files
committed
Change the bench programs to atoms of size 2
Trying it out confirms #31, and the better introspectivity of FilteredRE2 explains why: turns out the data set has a pretty small number of atoms of length 2 with high discriminatory power. Lowering length to 2 increases the number of atoms from 1630 to just 1865 (+235, +14.4%) which explains why memory use is unaffected or even goes down (some regexes which match none of the samples are likely not even tried anymore) but performances increase *dramatically* (48s -> 27s for re2, 38s -> 24s for regex). This makes sense as devices are also where #31 got extreme bang for its buck. It's a bit sad seeing re2 catch up so much with our hard work, but it makes sense if we assume `regex` has a more optimised regex matching at the cost of memory: with better discrimination we drastically decrease the amount of regex matching, which benefits the package with the slower regex matching. Although to be fair the re2 bench could also be slower due to the use of an `re2::Set` instead of an aho-corasick automaton. In fact that's pretty likely. However effect seems non-existent to slightly negative for UA and OS: - At 3-atoms, UAs have 849 atoms for 362 regex, and both re2 and regex run in about 10s (9.70~9.90 real), interestingly the RSS and memory footprint of regex are a lot lower there (25MB to 32~33 footprint). - At 2-atoms, UAs have 874 atoms for 362 regex, and both re2 and regex run a bit slower, around 10.50 for re2 and 10.40 for regex, memory use is the same. - OS is basically inbetween, going from 3-atoms to 2-atoms the number of atoms increases a small hair from 353 to 359 (for 201 regexes), the re2 performances remain stable (8.15~8.40) while regex seems to decrease a hair (from 7.10~7.20 to 7.60~7.70). Note that this is all over 100 runs parsing 75158 user agents. But that hints that maybe different configurations for the ua and device parsers would make sense... Fixes #30
1 parent 34432af commit f57c517

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

regex-filtered/examples/bench_regex.rs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
2727
.lines()
2828
.collect::<Result<Vec<String>, _>>()?;
2929

30-
let f = regex_filtered::Builder::new().push_all(&regexes)?.build()?;
30+
let f = regex_filtered::Builder::new_atom_len(2)
31+
.push_all(&regexes)?
32+
.build()?;
3133
eprintln!(
3234
"{} regexes in {}s",
3335
regexes.len(),

regex-filtered/re2/bench.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ int main(const int argc, const char* argv[]) {
3636
std::ifstream regexes_f(argv[1]);
3737

3838
re2::RE2::Options opt;
39-
re2::FilteredRE2 f(3);
39+
re2::FilteredRE2 f(2);
4040
int id;
4141

4242
std::string line;

0 commit comments

Comments
 (0)