Skip to content

Commit 91ec38d

Browse files
Modify the regex lab to explain how to use regexes
Signed-off-by: David A. Wheeler <[email protected]>
1 parent f66b60f commit 91ec38d

File tree

2 files changed

+72
-0
lines changed

2 files changed

+72
-0
lines changed

docs/labs/checker.css

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,8 @@ pre input, pre textarea {
1616
.displayNone {
1717
display: none;
1818
}
19+
20+
table, th, td {
21+
border: 1px solid black;
22+
}
23+

docs/labs/regex1.html

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,73 @@ <h2>Background</h2>
265265
<p>
266266
<h2>Task Information</h2>
267267
<p>
268+
Different regex languages have slightly different notations,
269+
but they have much in common. Here are some basic rules for regex
270+
notations:
271+
272+
<ol>
273+
<li>The most trivial rule is that a letter or digit matches itself. That is, the regex “<tt>d</tt>” matches the letter “<tt>d</tt>”. Most implementations use case-sensitive matches by default, and that is usually what you want.
274+
<li>Another rule is that square brackets surround a rule that specifies any of a number of characters. If the square brackets surround just alphanumerics, then the pattern matches any of them. So <tt>[brt]</tt> matches a single “<tt>b</tt>”, “<tt>r</tt>”, or “<tt>t</tt>”.
275+
Inside the brackets you can include
276+
ranges of symbols separated by dash ("-"), so
277+
<tt>[A-D]</tt> will match one character, which can be one A, one B, one C,
278+
or one D.
279+
You can do this more than once.
280+
For example,
281+
the term <tt>[A-Za-z]</tt> will match one character, which can be
282+
an uppercase Latin letter or a lowercase Latin letter.
283+
(This text assumes you're not using a long-obsolete character system
284+
like EBCDIC.)
285+
<li>If you follow a pattern with “<tt>&#42;</tt>”, that means
286+
<i>0 or more times</i>”.
287+
In almost all regex implementations (except POSIX BRE),
288+
following a pattern with "<tt>+</tt>" means "<i>1 or more times</i>".
289+
So <tt>[A-D]*</tt> will match 0 or more letters as long as every letter
290+
is an A, B, C, or D.
291+
<li>You can use "<tt>|</tt>" to identify options, any of which are acceptable.
292+
When validating input, should surround the options with parenthesis,
293+
because "<tt>|</tt>" has a low precedence.
294+
So for example, "<tt>(yes|no)</tt>" is a way to match either "yes" or "no".
295+
</ol>
296+
297+
<p>
298+
We want to use regexes to <i>validate</i> input.
299+
That is, the input should <i>completely</i> match the regex pattern.
300+
In regexes you can do this by prepending some symbol and appending a different
301+
symbol.
302+
Unfortunately, different languages use different symbols.
303+
The following table shows what you should prepend and append.
304+
305+
<p>
306+
<table>
307+
<tr>
308+
<th>Platform
309+
<th>Prepend
310+
<th>Append
311+
<tr>
312+
<td>POSIX BRE, POSIX ERE, and ECMAScript (JavaScript)
313+
<td>“^”
314+
<td>“$”
315+
<tr>
316+
<td>Java, .NET, PHP, Perl, and PCRE
317+
<td>“^” or “\A”
318+
<td>“\z”
319+
<tr>
320+
<td>Golang, Rust crate regex, and RE2
321+
<td>“^” or “\A”
322+
<td>“$” or “\z”
323+
<tr>
324+
<td>Python
325+
<td>“^” or “\A”
326+
<td>“\Z” (not “\z”)
327+
<tr>
328+
<td>Ruby
329+
<td>“\A”
330+
<td>“\z”
331+
</table>
332+
333+
<p>
334+
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “^(ab|de)$”. To validate the same thing in Python, use “^(ab|de)\Z” or “\A(ab|de)\Z”.
268335

269336
<p>
270337
<h2>Interactive Lab (<span id="grade"></span>)</h2>

0 commit comments

Comments
 (0)