automata: make PikeVM cache initialization lazy #1302
Merged
+23
−36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Prior to the advent of regex-automata, the PikeVM would decide how much
space it needed at the beginning of every search. In regex-automata, we
did away with that check at search time and moved it to the time at
which the cache is constructed. (The inputs to the sizing are currently
invariant in regex-automata, as they were in the old regex crate.)
The downside of this is that we create the caches for each regex engine
eagerly. So even if we never call the PikeVM (which is actually quite
common, since the lazy DFA handles mostly everything), we end up paying
for the memory of its cache. In many cases, this memory is likely
negligible, but it can be substantial if there are a lot of capture
groups, even if they aren't used. As in #1116.
We fix this by just re-arranging the meta regex engine wrappers to avoid
eagerly creating caches. Instead, they are only initialized when they
are actually needed.
This ends up making memory usage a bit less than
regex 1.7.3
.Fixes #1116