Skip to content

Conversation

@demerphq
Copy link
Contributor

This patch, which works on 1.18.x but does not work on 1.19 is an attempt to implement String.Chars protocol and also a Regex.to_string() and Regex.modifiers() and Regex.to_string!() and Regex.modifiers!() functions.

The idea is to make it possible to safely embed precompiled regexes into other regexes in a similar way as that supported by perl. The general idea is that ~r/foo/x turns into "(?x-ims:foo\n)", and etc. Thus it should match the same as it would have in its original form when it is embedded into a pattern which has a different set of modifiers.

For review by Jose.

@josevalim
Copy link
Member

Thank you for the PR. Given Regex.to_string cannot fully convert all regular expressions created by ~r/.../, only a subset of modifiers are supported, my suggestion is to simplify this considerably:

  1. There is no need to introduce Regex.modifiers, given the implementation in Inspect and String.Chars will be different anyway

  2. Generally speaking, our conversion functions (to_*) do not return {:ok, _} tuples, they simply return the result

  3. Do not implement String.Chars, if it can only convert a subset of the created regexes

So my suggestion is to add a single function, called Regex.to_embed/1, that receives a regex and either returns a string or raises an ArgumentError, and then test this function directly. This function will have its own implementation of Inspect.Range.translate_options tailored to its needs. Then in a separate pull request, I can make it so ~r/#{...}/ works by doing AST traversal in the sigil macro.

@demerphq demerphq force-pushed the yves/embeddable_regex branch from 7ab69cd to 0bef2c1 Compare April 2, 2025 21:55
@demerphq demerphq changed the title Initial attempt at embeddable regex objects Add support for Regex.to_embed!() and Regex.to_embed() Apr 2, 2025
@demerphq
Copy link
Contributor Author

demerphq commented Apr 2, 2025

Hi, I have updated the change as you requested. I made one tweak, adding a "strict" option for to_embed!(regex,strict) and make to_embed(regex) be the same as to_embed!(regex,false). When the pattern has been compiled with an option that cant be represented in an embeddable form an ArgumentError will be raised if strict is true, if strict is false such options will be ignored. The non strict mode is particularly helpful if the pattern was compiled with /u and is to be embedded in a pattern that is also compiled with /u.

FWIW, More modern versions of PCRE may one day support more options that just 'imsx'. Perl itself supports 'u' in embeddable form (although it has slightly different meaning, in Elixir/Erlang/PCRE the /u flag means "string encoded as unicode" and also "use unicode semantics". In the Perl /u means "use unciode semantics regardless of the encoding". This is why the exceptions mentions the current version of PCRE.

Hope this is what you had in mind with your feedback!

@demerphq demerphq force-pushed the yves/embeddable_regex branch from 0bef2c1 to 092407e Compare April 3, 2025 14:31
to_embed(regex,strict) returns an embeddable representation of regex.
For instance ~r/foo/i can be represented as ~r/(?i-msx:foo)/.

If the option :strict is true (the default) then it will throw an
ArgumentError if the regex was compiled with an option/modifier which
cannot be represented as an embeddable pattern.  If :strict is false
then any unembeddable options will be silently ignored.  This may be
perfectly reasonable, for intance the wrapped pattern may be compiled
with the same modifiers as the pattern, or reusing the pattern without
the unembeddable modifiers may not change its semantics.
@demerphq demerphq force-pushed the yves/embeddable_regex branch from 092407e to 9bce632 Compare April 4, 2025 20:27
@josevalim
Copy link
Member

Looks great, I have dropped only some minor suggestions now and we can ship it!

@josevalim josevalim changed the title Add support for Regex.to_embed!() and Regex.to_embed() Add Regex.to_embed() Apr 5, 2025
demerphq and others added 3 commits April 5, 2025 11:09
Minor fixups and simplifications.

Co-authored-by: José Valim <[email protected]>
* Sentences should not start with 'And'.
* Rework sentence about unlisted regex compile options.
* Consistent formatting for the the 'strict' option.
also add comment about why we sort the modifiers
@demerphq
Copy link
Contributor Author

demerphq commented Apr 5, 2025

Suggestions turned to commits, with one caveat about a compromise wording as noted, and I followed up on your point about to_string(). I didnt squash so its easier for you to review, you said previously you didnt mind doing that yourself.

@josevalim josevalim changed the title Add Regex.to_embed() Add Regex.to_embed/2 Apr 5, 2025
@josevalim josevalim merged commit ff68407 into elixir-lang:main Apr 5, 2025
10 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@demerphq demerphq deleted the yves/embeddable_regex branch June 6, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants