Skip to content

Conversation

BurntSushi
Copy link
Member

Specifically, when used with meta::Regex. Before this PR, if callers
built a meta::Regex with WhichCaptures::None, then it was possible
for find to sometimes return Some and sometimes return None,
just based on the sequence of previous search calls.

In particular, when WhichCaptures::None is used, some regex engines
(like the PikeVM) cannot report match offsets while some (like the
lazy DFA) can. This meant that if the meta regex engine happened to
select the lazy DFA for a Regex::find call, then it would return
Some. But if it happened to select the Pike VM, then it would return
None. Since engine selection can be influenced by the haystack itself,
this leads to the behavior of find being tied to the contents of the
haystack.

Instead, what we should do is make it so anything that returns match
offsets on a meta::Regex will always return None when
WhichCaptures::None is used, even if Regex::is_match returns
true.

(Yes, this is a weird option and it's crazy that Regex::is_match can
return true while Regex::find can return None. This was already
true before this PR and is a result of a very low level option that
optimizes for memory usage in specific circumstances. This sort of
whacky behavior can't be observed in the regex crate API. Only in
regex-automata.)

… used

Specifically, when used with `meta::Regex`. Before this PR, if callers
built a `meta::Regex` with `WhichCaptures::None`, then it was possible
for `find` to _sometimes_ return `Some` and _sometimes_ return `None`,
just based on the sequence of previous search calls.

In particular, when `WhichCaptures::None` is used, some regex engines
(like the `PikeVM`) cannot report match offsets while some (like the
lazy DFA) can. This meant that if the meta regex engine _happened_ to
select the lazy DFA for a `Regex::find` call, then it would return
`Some`. But if it _happened_ to select the Pike VM, then it would return
`None`. Since engine selection can be influenced by the haystack itself,
this leads to the behavior of `find` being tied to the contents of the
haystack.

Instead, what we should do is make it so anything that returns match
offsets on a `meta::Regex` will always return `None` when
`WhichCaptures::None` is used, _even_ if `Regex::is_match` returns
`true`.

(Yes, this is a weird option and it's crazy that `Regex::is_match` can
return `true` while `Regex::find` can return `None`. This was already
true before this PR and is a result of a very low level option that
optimizes for memory usage in specific circumstances. This sort of
whacky behavior can't be observed in the `regex` crate API. Only in
`regex-automata`.)
@BurntSushi BurntSushi merged commit 8f5d947 into master Oct 9, 2025
18 checks passed
@BurntSushi BurntSushi deleted the ag/consistent-none-capture-behavior branch October 9, 2025 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant