Skip to content

Conversation

zazedd
Copy link

@zazedd zazedd commented Aug 1, 2025

Related to #12

This PR adds a new extension, %pcre_i, for case-insensitive match.

Note

This updates the opam package to use ppxlib <= "0.35.0" only, as ppxlib.0.36.0 has breaking changes in the Parsetree AST.

@zazedd zazedd marked this pull request as ready for review August 1, 2025 17:11
@Drup
Copy link
Collaborator

Drup commented Aug 5, 2025

Is there an internal syntax for pcre that enable case-less matching (i.e., mapping to Re.no_case ?). In that case, it would be nice to have it as well (and it would be easy to make it tyre-compatible).

@zazedd
Copy link
Author

zazedd commented Aug 7, 2025

I believe pcre does have a syntax for case-less matching: (?i:regex)
This PR just sets the global flag for case insensitive match, doing it with Re.no_case would be interesting.
My idea would be to split the RE into its case distinct groups, add Re.no_case to the ones that need it, and then concatenate back together again. I don't know if this is the way tyre does it?

@Drup
Copy link
Collaborator

Drup commented Aug 7, 2025

Yes, I agree the option for setting global flags is useful. I would prefer something that can accumulate more flags (and is standard). Maybe the (standard, iirc) /i syntax ?

For the local flag, yes, just nesting Re.no_case would be fine. For Tyre, I would have to look again .. I might need to add it, but there is no problem in principle.

@paurkedal
Copy link
Owner

I also think it would be good to have a syntax which allows accumulating more flags, link anchored and multiline, but where could /i be placed syntactically?

@zazedd
Copy link
Author

zazedd commented Aug 13, 2025

where could /i be placed syntactically

The flags are generally placed at the end of the regex (think Perl regexes), but for Perl and others we must have a delimiter in the beginning as well. PCRE (the C lib) doesn't use this syntax but PCRE wrappers like PHP's preg_match() do.
If we find that the RE starts with a dellimiter / then it should end with one as well and potentially contain i or other flags for anchoring and multi-line matching in the future.
We can make the delimiter be another character, like #pattern#i to avoid escaping / in URL RE's, for example, which would be annoying.

@paurkedal
Copy link
Owner

paurkedal commented Sep 1, 2025

Ok, I was thinking of a global option, contemplating match%pcre.i ... with ... lacking a better idea; putting it inside the pattern avoids fitting it into the OCaml syntax.

Maybe we can use a syntax which does not clash with any valid pattern. According to the pcrepattern man page, I looks like the syntaxes (*<option>) and (*<option>=<value>) are used at the start of pattens to tweak the interpretation of REs, so how about something like (*CASELESS) or (*NOCASE)? The re library could also support caseless substrings if we can find a reasonable syntax for it.

@paurkedal
Copy link
Owner

I found precedence for a caseless in-expression syntax in the Mozilla regular expression modifier documentation:

(?flags1:pattern)
(?flags1-flags2:pattern)

where flags1 and flags2 are include and exclude lists of i, m, and s, respectively. So, how about sending (?i:pattern) to the nocase constructor?

@zazedd
Copy link
Author

zazedd commented Sep 9, 2025

Sorry for the late response.
That is actually the syntax I chose too: ahrefs fork caseless parsing, but we have no support for more flags after i
In our fork we also have the option to use global flags by using the /.../flags syntax, but I'll keep this out of this PR for now.
I will update this branch this week

@paurkedal
Copy link
Owner

Nice that you already used this syntax. I think we can manage to parse multiple option if we need to.

I was contemplating whether we could use ?i: at the start of a pattern as a full-pattern flag, but that would be novel syntax as far as I know. This would avoid introducing another special character, since ? at the start of a pattern would otherwise be invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants