automata: fix ID rollover bug in lazy DFA #1300
Merged
+30
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The lazy DFA has a cache of transitions that it may clear from time to
time if it gets too full. One cleared, transitions are re-generated.
There are two ways the cache gets full. First is if it uses too much
memory. Second is if there are so many states that it exceeds
LazyStateID::MAX
. You might expect this to be2^32
, but it's smallerthan that because of some bits reserved for tagging purposes.
When the cache is clearer, we have to be rather careful with our state.
For example, we are careful to "save" the current state so that we know
where to go next after the cache is cleared. And we need to re-map state
identifiers when this happens.
The abstraction for handling cache clearing is basically non-existent.
The current code basically tried to look before it leaps, and if the
cache might be cleared, then it will save the current state. (Saving
the current state is costly, so we don't always want to do it.) But if
the cache gets cleared and we think it definitely won't, then we don't
save the current state and things get FUBAR.
That's what happens in #1083 (I believe) and definitively what happens
in BurntSushi/ripgrep#3135. Specifically, the
"look before we leap" logic wasn't accounting for the number of states
exceeding the maximum. It was only accounting for memory usage.
Ideally we could have a better abstraction that makes this harder to get
wrong via a single point of truth on whether a cache gets cleared or
not, but this is tricky for perf reasons.
Fixes #1083
Fixes BurntSushi/ripgrep#3135