You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/labs/regex1.html
+67Lines changed: 67 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -265,6 +265,73 @@ <h2>Background</h2>
265
265
<p>
266
266
<h2>Task Information</h2>
267
267
<p>
268
+
Different regex languages have slightly different notations,
269
+
but they have much in common. Here are some basic rules for regex
270
+
notations:
271
+
272
+
<ol>
273
+
<li>The most trivial rule is that a letter or digit matches itself. That is, the regex “<tt>d</tt>” matches the letter “<tt>d</tt>”. Most implementations use case-sensitive matches by default, and that is usually what you want.
274
+
<li>Another rule is that square brackets surround a rule that specifies any of a number of characters. If the square brackets surround just alphanumerics, then the pattern matches any of them. So <tt>[brt]</tt> matches a single “<tt>b</tt>”, “<tt>r</tt>”, or “<tt>t</tt>”.
275
+
Inside the brackets you can include
276
+
ranges of symbols separated by dash ("-"), so
277
+
<tt>[A-D]</tt> will match one character, which can be one A, one B, one C,
278
+
or one D.
279
+
You can do this more than once.
280
+
For example,
281
+
the term <tt>[A-Za-z]</tt> will match one character, which can be
282
+
an uppercase Latin letter or a lowercase Latin letter.
283
+
(This text assumes you're not using a long-obsolete character system
284
+
like EBCDIC.)
285
+
<li>If you follow a pattern with “<tt>*</tt>”, that means
286
+
“<i>0 or more times</i>”.
287
+
In almost all regex implementations (except POSIX BRE),
288
+
following a pattern with "<tt>+</tt>" means "<i>1 or more times</i>".
289
+
So <tt>[A-D]*</tt> will match 0 or more letters as long as every letter
290
+
is an A, B, C, or D.
291
+
<li>You can use "<tt>|</tt>" to identify options, any of which are acceptable.
292
+
When validating input, should surround the options with parenthesis,
293
+
because "<tt>|</tt>" has a low precedence.
294
+
So for example, "<tt>(yes|no)</tt>" is a way to match either "yes" or "no".
295
+
</ol>
296
+
297
+
<p>
298
+
We want to use regexes to <i>validate</i> input.
299
+
That is, the input should <i>completely</i> match the regex pattern.
300
+
In regexes you can do this by prepending some symbol and appending a different
301
+
symbol.
302
+
Unfortunately, different languages use different symbols.
303
+
The following table shows what you should prepend and append.
304
+
305
+
<p>
306
+
<table>
307
+
<tr>
308
+
<th>Platform
309
+
<th>Prepend
310
+
<th>Append
311
+
<tr>
312
+
<td>POSIX BRE, POSIX ERE, and ECMAScript (JavaScript)
313
+
<td>“^”
314
+
<td>“$”
315
+
<tr>
316
+
<td>Java, .NET, PHP, Perl, and PCRE
317
+
<td>“^” or “\A”
318
+
<td>“\z”
319
+
<tr>
320
+
<td>Golang, Rust crate regex, and RE2
321
+
<td>“^” or “\A”
322
+
<td>“$” or “\z”
323
+
<tr>
324
+
<td>Python
325
+
<td>“^” or “\A”
326
+
<td>“\Z” (not “\z”)
327
+
<tr>
328
+
<td>Ruby
329
+
<td>“\A”
330
+
<td>“\z”
331
+
</table>
332
+
333
+
<p>
334
+
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “^(ab|de)$”. To validate the same thing in Python, use “^(ab|de)\Z” or “\A(ab|de)\Z”.
0 commit comments