Skip to content

Commit d6b3546

Browse files
committed
automata: fix ID rollover bug in lazy DFA
The lazy DFA has a cache of transitions that it may clear from time to time if it gets too full. One cleared, transitions are re-generated. There are two ways the cache gets full. First is if it uses too much memory. Second is if there are so many states that it exceeds `LazyStateID::MAX`. You might expect this to be `2^32`, but it's smaller than that because of some bits reserved for tagging purposes. When the cache is clearer, we have to be rather careful with our state. For example, we are careful to "save" the current state so that we know where to go next after the cache is cleared. And we need to re-map state identifiers when this happens. The abstraction for handling cache clearing is basically non-existent. The current code basically tried to look before it leaps, and if the cache *might* be cleared, then it will save the current state. (Saving the current state is costly, so we don't always want to do it.) But if the cache gets cleared and we think it definitely won't, then we don't save the current state and things get FUBAR. That's what happens in #1083 (I believe) and definitively what happens in BurntSushi/ripgrep#3135. Specifically, the "look before we leap" logic wasn't accounting for the number of states exceeding the maximum. It was only accounting for memory usage. Ideally we could have a better abstraction that makes this harder to get wrong via a single point of truth on whether a cache gets cleared or not, but this is tricky for perf reasons. Fixes #1083 Fixes BurntSushi/ripgrep#3135
1 parent ef1c2c3 commit d6b3546

File tree

3 files changed

+30
-3
lines changed

3 files changed

+30
-3
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
1.11.4 (TBD)
2+
============
3+
TODO
4+
5+
Bug fixes:
6+
7+
* [BUG #1165](https://github.com/rust-lang/regex/issues/1083):
8+
Fixes a panic in the lazy DFA (can only occur for especially large regexes).
9+
10+
111
1.11.3 (2025-09-25)
212
===================
313
This is a small patch release with an improvement in memory usage in some

regex-automata/src/hybrid/dfa.rs

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2132,7 +2132,19 @@ impl<'i, 'c> Lazy<'i, 'c> {
21322132
unit,
21332133
empty_builder,
21342134
);
2135-
let save_state = !self.as_ref().state_builder_fits_in_cache(&builder);
2135+
// This is subtle, but if we *might* clear the cache, then we should
2136+
// try to save the current state so that we can re-map its ID after
2137+
// cache clearing. We *might* clear the cache when either the new
2138+
// state can't fit in the cache or when the number of transitions has
2139+
// reached the maximum. Even if either of these conditions is true,
2140+
// the cache might not be cleared if we can reuse an existing state.
2141+
// But we don't know that at this point. Moreover, we don't save the
2142+
// current state every time because it is costly.
2143+
//
2144+
// TODO: We should try to find a way to make this less subtle and error
2145+
// prone. ---AG
2146+
let save_state = !self.as_ref().state_builder_fits_in_cache(&builder)
2147+
|| self.cache.trans.len() >= LazyStateID::MAX;
21362148
if save_state {
21372149
self.save_state(current);
21382150
}
@@ -2761,7 +2773,7 @@ impl<'i, 'c> LazyRef<'i, 'c> {
27612773
let needed = self.cache.memory_usage()
27622774
+ self.memory_usage_for_one_more_state(state.memory_usage());
27632775
trace!(
2764-
"lazy DFA cache capacity check: {:?} ?<=? {:?}",
2776+
"lazy DFA cache capacity state check: {:?} ?<=? {:?}",
27652777
needed,
27662778
self.dfa.cache_capacity
27672779
);
@@ -2773,6 +2785,11 @@ impl<'i, 'c> LazyRef<'i, 'c> {
27732785
fn state_builder_fits_in_cache(&self, state: &StateBuilderNFA) -> bool {
27742786
let needed = self.cache.memory_usage()
27752787
+ self.memory_usage_for_one_more_state(state.as_bytes().len());
2788+
trace!(
2789+
"lazy DFA cache capacity state builder check: {:?} ?<=? {:?}",
2790+
needed,
2791+
self.dfa.cache_capacity
2792+
);
27762793
needed <= self.dfa.cache_capacity
27772794
}
27782795

regex-automata/src/hybrid/id.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ impl LazyStateID {
180180
const MASK_QUIT: usize = 1 << (LazyStateID::MAX_BIT - 2);
181181
const MASK_START: usize = 1 << (LazyStateID::MAX_BIT - 3);
182182
const MASK_MATCH: usize = 1 << (LazyStateID::MAX_BIT - 4);
183-
const MAX: usize = LazyStateID::MASK_MATCH - 1;
183+
pub(crate) const MAX: usize = LazyStateID::MASK_MATCH - 1;
184184

185185
/// Create a new lazy state ID.
186186
///

0 commit comments

Comments
 (0)