You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You may or not know `agrep`. It is basically a "forgiving" `grep` and is, for instance, used for searching through (offline) dictionaries. It is tolerant against errors (up to degree you specify). It may be beneficial is you want to match against domains where you don't really know the pattern. It is just an idea, we will have to see if it is actually useful.
3
+
You may or may not know `agrep`, it is basically a "forgiving" `grep` and is, for instance, used for searching through (offline) dictionaries. It is tolerant against errors up to a degree you specify. It may be beneficial if you want to match against domains where you don't really know the pattern. It is just an idea, we will have to see if it is actually useful.
4
4
5
-
This is a somewhat complicated topic, we'll approach it by examples as it is very complicated to get the head around it by just listening to the specifications.
5
+
This is a somewhat complicated topic, we'll approach it by examples as it is very complicated to get your head around it by just listening to the specifications.
6
6
7
7
The approximate matching settings for a subpattern can be changed by appending *approx-settings* to the subpattern. Limits for the number of errors can be set and an expression for specifying and limiting the costs can be given:
8
8
9
9
## Accepted **insertions** (`+`)
10
10
11
-
Use `(something){+x}` to specify that the regex should still be matching when `x` characters would need it be *inserted* into the sub-expression `something`.
11
+
Use `(something){+x}` to specify that the regex should still match when `x` characters would need to be *inserted* into the sub-expression `something`.
12
12
13
13
Example:
14
14
@@ -24,7 +24,7 @@ The missing characters in the domain are substituted. The maximum number of inse
24
24
25
25
## Accepted **deletions** (`-`)
26
26
27
-
Use `(something){-x}` to specify that the regex should still be matching when `x` characters would need it be *deleted* from the sub-expression `something`:
27
+
Use `(something){-x}` to specify that the regex should still match when `x` characters would need to be *deleted* from the sub-expression `something`:
28
28
29
29
Example:
30
30
@@ -35,60 +35,63 @@ The surplus `e` in `neet` is deleted.
35
35
Similarly:
36
36
37
37
-`doubleclick.net` is matched by `^(doubleclicky\.netty){-3}$`
38
-
-`doubleclick.net` is NOT matched by `^(doubleclicky\.nettfy){-3}$`
38
+
-`doubleclick.net` is **not** matched by `^(doubleclicky\.nettfy){-3}$`
39
39
40
40
## Accepted **substitutions** (`#`)
41
41
42
-
Use `(something){#x}` to specify that the regex should still be matching when `x` characters would need to be *substituted*from the sub-expression `something`:
42
+
Use `(something){#x}` to specify that the regex should still match when `x` characters would need to be *substituted*in the sub-expression `something`:
43
43
44
44
Example 1:
45
45
46
-
-`oobargoobaploowap` is matched by `(foobar){#2~2}`
47
-
Hint: `goobap` is `foobar` with two substitutions `f->g` and `r->p`
46
+
-`oobargoobaploowap` is matched by `(foobar){#2}`
47
+
-Hint: `goobap` is `foobar` with `f` substituted for `g` and `r` substituted for `p`
48
48
49
49
Example 2:
50
50
51
51
-`doubleclick.net` is matched by `^doubleclick\.n(tt){#1}$`
52
52
53
-
The incorrect `t` in `ntt` is substituted. Note that substitutions are necessary when a character needs to be replaced as the corresponding realization with one insertion and one deletion is **not identical**:
53
+
The incorrect `t` in `ntt` is substituted. Note that substitutions are necessary when a character needs to be replaced as the following example (with 1 insertion and 1 deletion) is **not identical**:
54
54
55
-
`doubleclick.net` is matched by `^doubleclick\.n(tt){+1-1}$`
55
+
-`doubleclick.net` is matched by `^doubleclick\.n(tt){+1-1}$`
56
56
57
57
(`t` is removed, `e` is added), however
58
58
59
-
-`doubleclick.nt` is ALSO matched by `^doubleclick\.n(tt){+1-1}$`
59
+
-`doubleclick.nt` is **also** matched by `^doubleclick\.n(tt){+1-1}$`
60
60
61
-
(the `t` is just removed, nothing had to be added) but
61
+
(the `t` is removed, but nothing has to be added) but
62
62
63
-
-`doubleclick.nt` is NOT matched by `^doubleclick\.n(tt){#1}$`
63
+
-`doubleclick.nt` is **not** matched by `^doubleclick\.n(tt){#1}$`
64
64
65
-
doesn't match as substitutions always require characters to be swapped by others.
65
+
doesn't match as substitutions always require characters to be replaced by others.
66
66
67
67
## Combinations and total error limit (`~`)
68
68
69
-
All rules from above can be combined like as`{+2-5#6}`allowing (up to!) two insertions, five deletions, and six substitutions. You can enforce an upper limit on the number of tried realizations using the tilde. Even when `{+2-5#6}` can lead to up to 13 operations being tried, this can be limited to (at most) seven tries using `{+2-5#6~7}`.
69
+
All rules from above can be combined, for example`{+2-5#6}`allows up to 2 insertions, 5 deletions, and 6 substitutions. You can enforce an upper limit on the number of attempted operations using `~x`, for example even though `{+2-5#6}` can lead to up to 13 operations being tried, this can be limited to at most 7 operations using `{+2-5#6~7}`.
70
70
71
71
Example:
72
72
73
73
-`oobargoobploowap` is matched by `(foobar){+2#2~3}`
74
+
- Hint: `goobaap` is `foobar` with
75
+
- 2 substitutions (`f` to `g` and `r` to `p`)
76
+
- 1 addition (`a` in `bar` to make `baap`)
74
77
75
-
Hint: `goobaap` is `foobar` with
76
-
- two substitutions `f->g` and `r->p`, and
77
-
- one addition `a` between `bar` (to have `baap`)
78
-
79
-
Specifying `~2` instead of `~3` will lead to no match as three errors need to be corrected in total for a match in this example.
78
+
Specifying `~2` instead of `~3` will not match as there are 3 errors which need to be corrected in this example.
80
79
81
80
## Advanced topic: Cost-equation
82
81
83
-
You can even weight the "costs" of insertions, deletions or substitutions. This is really an advanced topic and should only be touched when really needed.
82
+
You can even weight the "costs" of insertions, deletions or substitutions. This is an advanced topic and should only be touched when really needed.
83
+
84
+
A *cost-equation* can be thought of as a mathematical equation where `i`, `d`, and `s` stand for the number of insertions, deletions, and substitutions respectively. The equation can have a multiplier for each of `i`, `d`, and `s`.
85
+
The multiplier is the **cost of the error**, and the number after `<` is the maximum allowed total cost of a match. Spaces and pluses can be inserted to make the equation more readable. When specifying only a cost equation, adding a space after the opening `{` is **required**.
84
86
85
-
A *cost-equation* can be thought of as a mathematical equation, where `i`, `d`, and `s` stand for the number of insertions, deletions, and substitutions, respectively. The equation can have a multiplier for each of `i`, `d`, and `s`.
86
-
The multiplier is the **cost of the error**, and the number after `<` is the maximum allowed total cost of a match. Spaces and pluses can be inserted to make the equation more readable. When specifying only a cost equation, adding a space after the opening `{` is **required** .
87
+
Example 1:
87
88
88
-
Example 1:`{ 2i + 1d + 2s < 5 }`
89
+
-`{ 2i + 1d + 2s < 5 }`
89
90
90
-
This sets the cost of an insertion to two, a deletion to one, a substitution to two, and the maximum cost to five.
91
+
This sets the cost of an insertion to 2, a deletion to 1, a substitution to 2, and the maximum cost to 5.
92
+
93
+
Example 2:
91
94
92
-
Example 2: `{+2-5#6, 2i + 1d + 2s < 5 }`
95
+
-`{ +2-5#6, 2i + 1d + 2s < 5 }`
93
96
94
-
This sets the cost of an insertion to two, a deletion to one, a substitution to two, and the maximum cost to five. Furthermore, it allows only up to 2 insertions (coming at a total cost of 4), five deletions and up to 6 substitutions. As six substitutions would come at a cost of `6*2 = 12`, exceeding the total allowed costs of 5, they cannot all be realized.
97
+
This sets the cost of an insertion to 2, a deletion to 1, a substitution to 2, and the maximum cost to 5. Furthermore, it allows only up to 2 insertions (for a total cost of 4), up to 5 deletions, and up to 6 substitutions. As 6 substitutions would come at a cost of `6*2 = 12`, exceeding the total allowed costs of 5, they cannot all be performed.
0 commit comments