You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docpages/basic-language-reference/keywords/MATCH.md
+65-50Lines changed: 65 additions & 50 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,42 +2,46 @@
2
2
3
3
```basic
4
4
MATCH result, pattern$, haystack$
5
+
MATCH result, pattern$, haystack$, var1$, var2$, ...
5
6
```
6
7
7
-
Evaluates a **POSIX ERE** (extended regular expression) against a string and stores **1** for a match or **0** for no match into `result`.
8
+
Evaluates a **POSIX ERE** (extended regular expression) against a string.
8
9
9
-
*`result` must be an **integer** variable.
10
-
*`pattern$` and `haystack$` are **strings**.
11
-
* Matching is **ASCII-only** (no locale/Unicode).
12
-
* No capture groups or sub-matches are returned; this is a **yes/no** test.
13
-
14
-
`MATCH` runs **cooperatively**: very large or pathological patterns are executed in slices.
10
+
* In the first form, stores **1** for a match or **0** for no match into `result`.
11
+
* In the second form, also assigns text captured by **parenthesised sub-expressions** to additional string variables (`var1$`, `var2$`, …).
15
12
16
-
\remark If the pattern is invalid, an error is raised with a descriptive message from the regex engine. Without an error handler, the program terminates. With an `ON ERROR` handler, control passes there.
13
+
\remark Matching is **ASCII-only** (no locale or Unicode).
14
+
\remark All regular expressions follow **POSIX ERE** syntax.
17
15
18
16
---
19
17
20
-
### Supported syntax (POSIX ERE subset)
18
+
### Forms
21
19
22
-
* Literals: `ABC`
23
-
* Any char: `.`
24
-
* Quantifiers: `* + ?` (greedy)
25
-
* Character classes: `[abc]`, ranges `[a-z]`, negation `[^0-9]`
26
-
* Alternation: `A|B`
27
-
* Anchors: `^` (start of string), `$` (end of string)
20
+
#### Boolean match
28
21
29
-
### Not supported
22
+
```basic
23
+
MATCH result, pattern$, haystack$
24
+
```
30
25
31
-
* Backreferences `\1`, `\2`, …
32
-
* Inline flags like `(?i)` (use explicit classes instead, or upper/lower where appropriate)
33
-
* PCRE extensions (`\d`, `\w`, lookaround, etc.)
34
-
* Multiline mode: `^` and `$` match **string** boundaries only.
26
+
*`result` must be an **integer** variable.
27
+
*`pattern$` and `haystack$` are **strings**.
28
+
* Returns 1 for a match, 0 for no match.
29
+
30
+
#### Match with captures
31
+
32
+
```basic
33
+
MATCH result, pattern$, haystack$, cap1$, cap2$, ...
34
+
```
35
+
36
+
* Each parenthesised group in `pattern$` (e.g. `(abc)`) is captured and copied into successive string variables.
37
+
* Missing or non-participating groups yield `""`.
38
+
* If the pattern contains fewer capture groups than variables, the extras receive empty strings.
35
39
36
40
---
37
41
38
42
### Examples
39
43
40
-
**Simple literal**
44
+
**Simple match**
41
45
42
46
```basic
43
47
MATCH R, "HELLO", "HELLO WORLD"
@@ -52,48 +56,33 @@ PRINT R ' 1
52
56
53
57
MATCH R, "END$", "THE END"
54
58
PRINT R ' 1
55
-
56
-
MATCH R, "^A", "BA"
57
-
PRINT R ' 0
58
59
```
59
60
60
-
**Alternation**
61
+
**Alternation and character classes**
61
62
62
63
```basic
63
64
MATCH R, "CAT|DOG", "HOTDOG"
64
65
PRINT R ' 1
65
66
66
-
MATCH R, "RED|GREEN", "BLUE"
67
-
PRINT R ' 0
68
-
```
69
-
70
-
**Character classes and ranges**
71
-
72
-
```basic
73
-
MATCH R, "[0-9]+", "foo123bar"
74
-
PRINT R ' 1
75
-
76
67
MATCH R, "[A-Z][a-z]+", "Title"
77
68
PRINT R ' 1
78
-
79
-
MATCH R, "[^x]*z$", "crab ballz"
80
-
PRINT R ' 1
81
69
```
82
70
83
-
**Wildcard and quantifiers**
71
+
**Capturing sub-expressions**
84
72
85
73
```basic
86
-
MATCH R, "A.*C", "AXYZC"
87
-
PRINT R ' 1
74
+
MATCH R, "([A-Za-z]+),([A-Za-z]+)", "Hello,World", FIRST$, SECOND$
| Character classes |`[abc]`, `[A-Z]`, `[^0-9]`| set, range, negation ||
108
+
| Alternation | `A | B` | match A or B |
109
+
| Anchors |`^`, `$`| start / end of string ||
110
+
| Capturing groups |`(ABC)`| capture substring ||
111
+
112
+
---
113
+
114
+
### Not supported
115
+
116
+
* Backreferences `\1`, `\2`, …
117
+
* Inline flags `(?i)` etc.
118
+
* PCRE-style escapes (`\d`, `\w`, lookaround, …)
119
+
* Multiline mode (`^` and `$` match string boundaries only)
120
+
121
+
---
122
+
111
123
### Notes
112
124
113
-
* Matching is **case-sensitive** by default. To approximate case-insensitive tests, normalise your data (e.g., convert both strings to upper case before matching) or use character classes (e.g., `[Hh][Ee][Ll][Ll][Oo]`).
114
-
* Because `MATCH` is cooperative, very large inputs or patterns may take multiple idle ticks to complete. You do not need to poll—control returns to your program automatically once finished.
115
-
*`^` and `$` are **string** anchors, not line anchors; there is no multiline mode.
116
-
* The engine is compiled with `REG_NOSUB`; capture offsets are not available to BASIC code.
125
+
* Matching is **case-sensitive**. To simulate case-insensitive matching, normalise both strings or use explicit character classes.
126
+
* With captures, **co-operative execution is disabled** — the operation completes immediately.
127
+
* Without captures, matching runs **co-operatively** across idle ticks for long inputs.
128
+
* If the pattern is invalid, the engine reports a descriptive message.
129
+
Without an error handler, the program terminates;
130
+
with `ON ERROR PROCname`, control transfers to the handler.
131
+
* Capture results are always independent copies; modifying the original string has no effect on captured values.
</div><!-- fragment --><p>Evaluates a <b>POSIX ERE</b> (extended regular expression) against a string and stores <b>1</b> for a match or <b>0</b> for no match into <spanclass="tt">result</span>.</p>
</div><!-- fragment --><p>Evaluates a <b>POSIX ERE</b> (extended regular expression) against a string.</p>
100
101
<ul>
102
+
<li>In the first form, stores <b>1</b> for a match or <b>0</b> for no match into <spanclass="tt">result</span>.</li>
103
+
<li>In the second form, also assigns text captured by <b>parenthesised sub-expressions</b> to additional string variables (<spanclass="tt">var1$</span>, <spanclass="tt">var2$</span>, …).</li>
104
+
</ul>
105
+
<dlclass="section remark"><dt>Remarks</dt><dd>Matching is <b>ASCII-only</b> (no locale or Unicode). </dd>
106
+
<dd>
107
+
All regular expressions follow <b>POSIX ERE</b> syntax.</dd></dl>
<li><spanclass="tt">result</span> must be an <b>integer</b> variable.</li>
102
116
<li><spanclass="tt">pattern$</span> and <spanclass="tt">haystack$</span> are <b>strings</b>.</li>
103
-
<li>Matching is <b>ASCII-only</b> (no locale/Unicode).</li>
104
-
<li>No capture groups or sub-matches are returned; this is a <b>yes/no</b> test.</li>
117
+
<li>Returns 1 for a match, 0 for no match.</li>
105
118
</ul>
106
-
<p><spanclass="tt">MATCH</span> runs <b>cooperatively</b>: very large or pathological patterns are executed in slices.</p>
107
-
<dlclass="section remark"><dt>Remarks</dt><dd>If the pattern is invalid, an error is raised with a descriptive message from the regex engine. Without an error handler, the program terminates. With an <spanclass="tt">ON ERROR</span> handler, control passes there.</dd></dl>
<li>Each parenthesised group in <spanclass="tt">pattern$</span> (e.g. <spanclass="tt">(abc)</span>) is captured and copied into successive string variables.</li>
124
+
<li>Missing or non-participating groups yield <spanclass="tt">""</span>.</li>
125
+
<li>If the pattern contains fewer capture groups than variables, the extras receive empty strings.</li>
<tdclass="markdownTableBodyNone">Any char </td><tdclass="markdownTableBodyNone"><spanclass="tt">.</span></td><tdclass="markdownTableBodyNone">matches any single character </td><tdclass="markdownTableBodyNone"></td></tr>
<tdclass="markdownTableBodyNone">Alternation </td><tdclass="markdownTableBodyNone"><spanclass="tt">A \ilinebr </td> <td class="markdownTableBodyNone"> B</span></td><tdclass="markdownTableBodyNone">match A or B </td></tr>
176
+
<trclass="markdownTableRowEven">
177
+
<tdclass="markdownTableBodyNone">Anchors </td><tdclass="markdownTableBodyNone"><spanclass="tt">^</span>, <spanclass="tt">$</span></td><tdclass="markdownTableBodyNone">start / end of string </td><tdclass="markdownTableBodyNone"></td></tr>
178
+
<trclass="markdownTableRowOdd">
179
+
<tdclass="markdownTableBodyNone">Capturing groups </td><tdclass="markdownTableBodyNone"><spanclass="tt">(ABC)</span></td><tdclass="markdownTableBodyNone">capture substring </td><tdclass="markdownTableBodyNone"></td></tr>
<li>Matching is <b>case-sensitive</b> by default. To approximate case-insensitive tests, normalise your data (e.g., convert both strings to upper case before matching) or use character classes (e.g., <spanclass="tt">[Hh][Ee][Ll][Ll][Oo]</span>).</li>
180
-
<li>Because <spanclass="tt">MATCH</span> is cooperative, very large inputs or patterns may take multiple idle ticks to complete. You do not need to poll—control returns to your program automatically once finished.</li>
181
-
<li><spanclass="tt">^</span> and <spanclass="tt">$</span> are <b>string</b> anchors, not line anchors; there is no multiline mode.</li>
182
-
<li>The engine is compiled with <spanclass="tt">REG_NOSUB</span>; capture offsets are not available to BASIC code. </li>
194
+
<li>Matching is <b>case-sensitive</b>. To simulate case-insensitive matching, normalise both strings or use explicit character classes.</li>
195
+
<li>With captures, <b>co-operative execution is disabled</b> — the operation completes immediately.</li>
196
+
<li>Without captures, matching runs <b>co-operatively</b> across idle ticks for long inputs.</li>
197
+
<li>If the pattern is invalid, the engine reports a descriptive message. Without an error handler, the program terminates; with <spanclass="tt">ON ERROR PROCname</span>, control transfers to the handler.</li>
198
+
<li>Capture results are always independent copies; modifying the original string has no effect on captured values. </li>
0 commit comments