Skip to content

Commit 9bce632

Browse files
author
Yves Orton
committed
regex.ex - add support for to_embed()
to_embed(regex,strict) returns an embeddable representation of regex. For instance ~r/foo/i can be represented as ~r/(?i-msx:foo)/. If the option :strict is true (the default) then it will throw an ArgumentError if the regex was compiled with an option/modifier which cannot be represented as an embeddable pattern. If :strict is false then any unembeddable options will be silently ignored. This may be perfectly reasonable, for intance the wrapped pattern may be compiled with the same modifiers as the pattern, or reusing the pattern without the unembeddable modifiers may not change its semantics.
1 parent 2e3b812 commit 9bce632

File tree

1 file changed

+97
-0
lines changed

1 file changed

+97
-0
lines changed

lib/elixir/lib/regex.ex

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,80 @@ defmodule Regex do
412412
opts
413413
end
414414

415+
@doc """
416+
Returns the pattern as an embeddable string.
417+
418+
If the pattern was compiled with an option which cannot be represented
419+
as an embeddable modifier in the current version of PCRE and strict is true
420+
(the default) then an ArgumentError exception will be raised.
421+
422+
When strict is false the pattern will be returned as though any offending
423+
options had not be used and the function will not raise any exceptions.
424+
425+
Embeddable modifiers/options are currently:
426+
* 'i' - :caseless
427+
* 'm' - :multiline
428+
* 's' - :dotall, :newline, :anycrlf})
429+
* 'x' - :extended
430+
431+
And unembeddable modifiers are
432+
* 'f' - :firstline
433+
* 'U' - :ungreedy
434+
* 'u' - :unicode, :ucp
435+
436+
Any other regex compilation option not listed here is considered unembeddable.
437+
438+
## Examples
439+
iex> Regex.to_embed(~r/foo/)
440+
"(?-imsx:foo)"
441+
442+
iex> Regex.to_embed(~r/^foo/m)
443+
"(?m-isx:^foo)"
444+
445+
iex> Regex.to_embed(~r/foo # comment/ix)
446+
"(?ix-ms:foo # comment\\n)"
447+
448+
iex> Regex.to_embed(~r/foo/iu)
449+
** (ArgumentError) regex compiled with options [:ucp, :unicode] which cannot be represented as an embedded pattern in this version of PCRE
450+
451+
iex> Regex.to_embed(~r/foo/imsxu, strict: false)
452+
"(?imsx:foo\\n)"
453+
454+
"""
455+
@spec to_embed(t, [term]) :: String.t()
456+
def to_embed(%Regex{source: source, opts: regex_opts}, embed_opts \\ []) do
457+
strict = Keyword.get(embed_opts, :strict, true)
458+
459+
modifiers =
460+
case embeddable_modifiers(regex_opts) do
461+
{:ok, modifiers} ->
462+
modifiers
463+
464+
{:error, modifiers, untranslatable} ->
465+
if strict do
466+
raise ArgumentError,
467+
"regex compiled with options #{inspect(untranslatable)} which cannot be " <>
468+
"represented as an embedded pattern in this version of PCRE"
469+
else
470+
modifiers
471+
end
472+
end
473+
474+
disabled =
475+
Enum.reject([?i, ?m, ?s, ?x], &(&1 in modifiers))
476+
|> List.to_string()
477+
478+
disabled = if disabled != "", do: "-#{disabled}", else: ""
479+
480+
modifiers =
481+
Enum.sort(modifiers)
482+
|> List.to_string()
483+
484+
nl = if Enum.member?(regex_opts, :extended), do: "\n", else: ""
485+
486+
"(?#{modifiers}#{disabled}:#{source}#{nl})"
487+
end
488+
415489
@doc """
416490
Returns a list of names in the regex.
417491
@@ -845,6 +919,29 @@ defmodule Regex do
845919

846920
# Helpers
847921

922+
# translate options to modifiers as required for emedding
923+
defp embeddable_modifiers(list), do: embeddable_modifiers(list, [], [])
924+
925+
defp embeddable_modifiers([:dotall, {:newline, :anycrlf} | t], acc, err),
926+
do: embeddable_modifiers(t, [?s | acc], err)
927+
928+
defp embeddable_modifiers([:caseless | t], acc, err),
929+
do: embeddable_modifiers(t, [?i | acc], err)
930+
931+
defp embeddable_modifiers([:extended | t], acc, err),
932+
do: embeddable_modifiers(t, [?x | acc], err)
933+
934+
defp embeddable_modifiers([:multiline | t], acc, err),
935+
do: embeddable_modifiers(t, [?m | acc], err)
936+
937+
defp embeddable_modifiers([option | t], acc, err),
938+
do: embeddable_modifiers(t, acc, [option | err])
939+
940+
defp embeddable_modifiers([], acc, []), do: {:ok, acc}
941+
defp embeddable_modifiers([], acc, err), do: {:error, acc, err}
942+
943+
# translate modifers to options
944+
848945
defp translate_options(<<?s, t::binary>>, acc),
849946
do: translate_options(t, [:dotall, {:newline, :anycrlf} | acc])
850947

0 commit comments

Comments
 (0)