You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+96-18Lines changed: 96 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,48 +4,102 @@
4
4
5
5
This repo provides two PPXes providing regular expression-based routing:
6
6
7
-
-`ppx_regexp` maps to [re][] with the conventional last-match extraction
8
-
into `string` and `string option`.
7
+
-`ppx_regexp_extended` maps to [re][] with the conventional last-match extraction
8
+
into `string` and `string option`. Two syntaxes for regular expressions available:
9
+
-`pcre`: The syntax of regular PCRE expressions
10
+
-`mikmatch`: Mimics the syntax of the [mikmatch](https://mjambon.github.io/mjambon2016/mikmatch-manual.html) tool
9
11
-`ppx_tyre` maps to [Tyre][tyre] providing typed extraction into options,
10
12
lists, tuples, objects, and polymorphic variants.
11
13
12
14
Another difference is that `ppx_regexp` works directly on strings
13
15
essentially hiding the library calls, while `ppx_tyre` provides `Tyre.t` and
14
16
`Tyre.route` which can be composed an applied using the Tyre library.
15
17
16
-
## `ppx_regexp` - Regular Expression Matching with OCaml Patterns
18
+
## `ppx_regexp_extended` - Regular Expression Matching with OCaml Patterns
17
19
18
-
This syntax extension turns
20
+
This syntax extension turns:
19
21
```ocaml
20
22
function%pcre
21
23
| {|re1|} -> e1
22
24
...
23
25
| {|reN|} -> eN
24
26
| _ -> e0
25
27
```
26
-
into suitable invocations of the [Re library][re], and similar for
27
-
`match%pcre`. The patterns are plain strings of the form accepted by
28
-
`Re_pcre`, with the following additions:
28
+
(or `function%mik`) into suitable invocations of the [Re library][re], and similar for `match%pcre`/`match%mik`.
29
+
30
+
It also accepts:
31
+
```ocaml
32
+
let%pcre var = {| some regex |}
33
+
(* and *)
34
+
let%mik var = {| some regex |}
35
+
```
36
+
37
+
### `%pcre`
38
+
39
+
The patterns are plain strings of the form accepted by `Re.Pcre`, with the following additions:
29
40
30
41
-`(?<var>...)` defines a group and binds whatever it matches as `var`.
31
42
The type of `var` will be `string` if the match is guaranteed given that
32
43
the whole pattern matches, and `string option` if the variable is bound
33
44
to or nested below an optionally matched group.
34
45
46
+
-`(N?<var>)` gets substituted by the value of the globally defined string variable named `var`,
47
+
and binds whatever it matches as `var`.
48
+
The type of `var` will be the same as `(?<var>...)`.
49
+
50
+
-`(N?<var as name>)` gets substituted by the value of the globally defined string variable named `var`,
51
+
and binds whatever it matches as `name`.
52
+
The type of `name` will be the same as `(?<var>...)`.
53
+
54
+
-`(U?<var>)` gets substituted by the value of the globally defined string variable named `var`,
55
+
and does not bind its match to any name.
56
+
35
57
-`?<var>` at the start of a pattern binds group 0 as `var : string`.
36
58
This may not be the full string if the pattern is unanchored.
37
59
38
60
A variable is allowed for the universal case and is bound to the matched
39
-
string. A regular alias is currently not allowed for patterns, since it is
40
-
not obvious whether is should bind the full string or group 0.
61
+
string.
62
+
63
+
### `%mik`
64
+
65
+
The syntax that this extension accepts is as follows:
66
+
67
+
-`char-literal`: Match the given character (priority 0).
68
+
-`_` (underscore): Match any character (priority 0).
69
+
-`string-literal`: Match the given sequence of characters (priority 0).
70
+
-`[set-of-characters]`: Character class, match one of the characters given by set-of-characters (priority 0). The grammar for set-of-characters is the following:
71
+
-`char-literal`−`char-literal`: defines a range of characters according to the iso-8859-1 encoding (includes ASCII).
72
+
-`char-literal`: defines a singleton (a set containing just this character).
73
+
-`string-literal`: defines a set that contains all the characters present in the given string.
74
+
-`lowercase-identifier`: is replaced by the corresponding predefined regular expression; this regular expression must be exactly of length 1 and therefore represents a set of characters.
75
+
-`set-of-characters`: set-of-characters defines the union of two sets of characters.
76
+
-`[^set-of-characters]`: Negative character class
77
+
-`regexp *`: Match the pattern given by regexp 0 time or more (priority 0).
78
+
-`regexp +`: Match the pattern given by regexp 1 time or more (priority 0).
79
+
-`regexp ?`: Match the pattern given by regexp at most once (priority 0).
80
+
-`regexp{m−n}`: Match regexp at least `m` times and up to `n` times. `m` and `n` must be integer literals (priority 0).
81
+
-`regexp{n}`: Same as regexp{n−n} (priority 0).
82
+
-`( regexp )`: Match regexp (priority 0).
83
+
-`regexp regexp`: Match the first regular expressions and then the second one (priority 1).
84
+
-`regexp | regexp`: Match one of these two regular expressions (priority 2).
85
+
-`regexp as lowercase-identifier`: Give a name to the substring that will be matched by the given pattern. This string becomes available under this name (priority 3).
86
+
In-place conversions of the matched substring can be performed using one these three mechanisms:
87
+
-`regexp as lowercase-identifier : int`: `int` behaves as `int_of_string`
88
+
-`regexp as lowercase-identifier : float`: `float` behaves as `float_of_string`
89
+
-`regexp as lowercase-identifier := converter`: where `converter` is any function which converts a string into something else.
90
+
91
+
In addition, the following predefined character classes are available:
-**Control sequences:**`eos` (same as `$`), `eol` (end of string or newline), `bnd` (word boundary `\b`), `bos` (same as `^`), `any` (any character except newline)
41
94
42
95
### Example
43
96
44
-
The following prints out times and hosts for SMTP connections to the Postfix
45
-
daemon:
97
+
The following prints out times and hosts for SMTP connections to the Postfix daemon:
0 commit comments