Skip to content

Commit 092407e

Browse files
author
Yves Orton
committed
regex.ex - add support for to_embed!() and to_embed()
to_embed!(regex,strict) returns an embeddable representation of regex. For instance ~r/foo/i can be represented as ~r/(?i-msx:foo)/. If strict is true (the default) then it will throw an ArgumentError if the regex was compiled with an option/modifier which cannot be represented as an embeddable pattern. If strict is false then it will ignore any unembeddable options. This can be helpful if the pattern was compiled with /u and will be embedded in a pattern also compiled with /u. to_embed(regex) is the same as to_embed!(regex,false).
1 parent 2e3b812 commit 092407e

File tree

1 file changed

+108
-0
lines changed

1 file changed

+108
-0
lines changed

lib/elixir/lib/regex.ex

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,91 @@ defmodule Regex do
412412
opts
413413
end
414414

415+
@doc """
416+
Returns the pattern as an embeddable string.
417+
418+
If the pattern was compiled with an option which cannot be represented
419+
as an embeddable modifier in the current version of PCRE and strict is true
420+
(the default) then an ArgumentError exception will be raised.
421+
422+
When strict is false the pattern will be returned as though any offending
423+
options had not be used and the function will not raise any exceptions.
424+
425+
Embeddable modifiers/options are currently:
426+
* 'i' - :caseless
427+
* 'm' - :multiline
428+
* 's' - :dotall, :newline, :anycrlf})
429+
* 'x' - :extended
430+
431+
And unembeddable modifiers are
432+
* 'f' - :firstline
433+
* 'U' - :ungreedy
434+
* 'u' - :unicode, :ucp
435+
436+
Any other regex compilation option not listed here is considered unembeddable.
437+
438+
## Examples
439+
440+
iex> Regex.to_embed!(~r/foo/m)
441+
"(?m-isx:foo)"
442+
443+
iex> Regex.to_embed!(~r/foo # comment/ix)
444+
"(?ix-ms:foo # comment\\n)"
445+
446+
iex> Regex.to_embed!(~r/foo/iu)
447+
** (ArgumentError) regex compiled with options [:ucp, :unicode] which cannot be represented as an embedded pattern in this version of PCRE
448+
449+
iex> Regex.to_embed!(~r/foo/imsxu,false)
450+
"(?imsx:foo\\n)"
451+
452+
"""
453+
@spec to_embed!(t, boolean()) :: String.t()
454+
def to_embed!(%Regex{source: source, opts: opts}, strict \\ true) do
455+
modifiers =
456+
case embeddable_modifiers(opts) do
457+
{:ok, modifiers} ->
458+
modifiers
459+
460+
{:error, modifiers, untranslatable} ->
461+
if strict do
462+
raise ArgumentError,
463+
"regex compiled with options #{inspect(untranslatable)} which cannot be " <>
464+
"represented as an embedded pattern in this version of PCRE"
465+
else
466+
modifiers
467+
end
468+
end
469+
470+
disabled =
471+
Enum.reject([?i, ?m, ?s, ?x], &(&1 in modifiers))
472+
|> List.to_string()
473+
474+
disabled = if disabled != "", do: "-#{disabled}", else: ""
475+
476+
modifiers =
477+
Enum.sort(modifiers)
478+
|> List.to_string()
479+
480+
nl = if Enum.member?(opts, :extended), do: "\n", else: ""
481+
482+
"(?#{modifiers}#{disabled}:#{source}#{nl})"
483+
end
484+
485+
@doc """
486+
Returns the pattern as en embeddable string. Ignores any options which cannot
487+
be represented as an embeddable pattern in the current version of PCRE. Same
488+
as calling `to_embed!()` with strict set to false.
489+
490+
## Examples
491+
492+
iex> Regex.to_embed(~r/foo/iu)
493+
(?i-msx:foo)
494+
"""
495+
@spec to_embed(t) :: String.t()
496+
def to_embed(%Regex{} = regex) do
497+
to_embed!(regex, false)
498+
end
499+
415500
@doc """
416501
Returns a list of names in the regex.
417502
@@ -845,6 +930,29 @@ defmodule Regex do
845930

846931
# Helpers
847932

933+
# translate options to modifiers as required for emedding
934+
defp embeddable_modifiers(list), do: embeddable_modifiers(list, [], [])
935+
936+
defp embeddable_modifiers([:dotall, {:newline, :anycrlf} | t], acc, err),
937+
do: embeddable_modifiers(t, [?s | acc], err)
938+
939+
defp embeddable_modifiers([:caseless | t], acc, err),
940+
do: embeddable_modifiers(t, [?i | acc], err)
941+
942+
defp embeddable_modifiers([:extended | t], acc, err),
943+
do: embeddable_modifiers(t, [?x | acc], err)
944+
945+
defp embeddable_modifiers([:multiline | t], acc, err),
946+
do: embeddable_modifiers(t, [?m | acc], err)
947+
948+
defp embeddable_modifiers([option | t], acc, err),
949+
do: embeddable_modifiers(t, acc, [option | err])
950+
951+
defp embeddable_modifiers([], acc, []), do: {:ok, acc}
952+
defp embeddable_modifiers([], acc, err), do: {:error, acc, err}
953+
954+
# translate modifers to options
955+
848956
defp translate_options(<<?s, t::binary>>, acc),
849957
do: translate_options(t, [:dotall, {:newline, :anycrlf} | acc])
850958

0 commit comments

Comments
 (0)