Skip to content

Commit d1758a7

Browse files
committed
Add example code for regular expressions
1 parent d1e0152 commit d1758a7

File tree

2 files changed

+262
-0
lines changed

2 files changed

+262
-0
lines changed

source-code/regexes/README.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Regexes
2+
3+
Regular expressionn are a formalism to describe sets of strings. They
4+
can be used to check whether a string matches a given pattern, for
5+
exstracting parts of strings, or substituting part of strings in a
6+
way that is much more powerful and flexible than the `str` methods
7+
for those purposes.
8+
9+
## What is it?
10+
1. `regexes.ipynb`: Jupiter notebook illustrating various aspects of
11+
using regular expressions in string-related tasks. This conveys
12+
the flavor, rather than being a comprehensive introduction.
13+
14+
More information on regular expressions can be found in the Python
15+
introduction slides.

source-code/regexes/regexes.ipynb

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,247 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Regular expressions in Python"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Regular expressions are very useful in many situations, and not exclusive to Python. In fact, once you grasp the concepts, you'll find them indispensible and use them (or miss) them for many programming and data management tasks. This notebook intends to give you a flavor of the possibilities, it doesn't intend to be a comprehensive overview."
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"In Python, regular expressions are implemented in the standard library's `re` module."
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": null,
27+
"metadata": {
28+
"collapsed": true
29+
},
30+
"outputs": [],
31+
"source": [
32+
"import re"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"## Match making"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"One of the tasks regular expressions are useful for is verifying whether a (collection of) string(s) matches a certain pattern."
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"metadata": {},
52+
"source": [
53+
"Example: for a list of file names, select only the ones that start with `dev_`, and end with `.txt`."
54+
]
55+
},
56+
{
57+
"cell_type": "code",
58+
"execution_count": null,
59+
"metadata": {
60+
"collapsed": false
61+
},
62+
"outputs": [],
63+
"source": [
64+
"file_list = ['dev_counter.txt', 'dev_reset.txt', 'shm_counter.txt', 'dev_start.txt']\n",
65+
"for file_name in file_list:\n",
66+
" if re.match(r'dev_.*\\.txt', file_name):\n",
67+
" print(file_name)"
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"A somewhat more complex example, select file names that have a base name ending in digits, and extension either `.txt`, or `.dat`."
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"metadata": {
81+
"collapsed": false
82+
},
83+
"outputs": [],
84+
"source": [
85+
"file_list = ['exp_01.txt', 'exp.txt', 'exp_02.dat', 'exp.dat', 'exp05.dat', 'exp_03.jpg']\n",
86+
"for file_name in file_list:\n",
87+
" if re.search(r'\\d+\\.(?:txt|dat)', file_name):\n",
88+
" print(file_name)"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"metadata": {},
94+
"source": [
95+
"Is this really correct? Let's try something nasty."
96+
]
97+
},
98+
{
99+
"cell_type": "code",
100+
"execution_count": null,
101+
"metadata": {
102+
"collapsed": false
103+
},
104+
"outputs": [],
105+
"source": [
106+
"if re.search(r'\\d+\\.(?:txt|dat)', 'exp_09.data'):\n",
107+
" print('Oops!')\n",
108+
"else:\n",
109+
" print(\"Yay!\")"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"metadata": {},
115+
"source": [
116+
"Let's ensure that the strings have to end with either `.txt`, or `.dat`."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"metadata": {
123+
"collapsed": false
124+
},
125+
"outputs": [],
126+
"source": [
127+
"if re.search(r'\\d+\\.(?:txt|dat)$', 'exp_09.data'):\n",
128+
" print('Oops!')\n",
129+
"else:\n",
130+
" print(\"Yay!\")"
131+
]
132+
},
133+
{
134+
"cell_type": "markdown",
135+
"metadata": {},
136+
"source": [
137+
"## Extracting stuff"
138+
]
139+
},
140+
{
141+
"cell_type": "markdown",
142+
"metadata": {},
143+
"source": [
144+
"Regular expressions can also be used to capture parts of a string while matching."
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"metadata": {},
150+
"source": [
151+
"Suppose we are only interested in the numbers in file names like `exp_01.dat`."
152+
]
153+
},
154+
{
155+
"cell_type": "code",
156+
"execution_count": null,
157+
"metadata": {
158+
"collapsed": false
159+
},
160+
"outputs": [],
161+
"source": [
162+
"file_list = ['exp_01.dat', 'meta.txt', 'exp_02.dat', 'exp_10.dat', 'exp_05.dat', 'exp_03.jpg']\n",
163+
"for file_name in file_list:\n",
164+
" match = re.search(r'exp_(\\d+)\\.dat', file_name)\n",
165+
" if match:\n",
166+
" print(match.group(1))"
167+
]
168+
},
169+
{
170+
"cell_type": "markdown",
171+
"metadata": {},
172+
"source": [
173+
"Note the difference between grouping brackets such as `(?:txt|dat)`, and capturing brackets such as `(\\d+)`. Capturing brackets also group, but grouping brackets don't capture."
174+
]
175+
},
176+
{
177+
"cell_type": "markdown",
178+
"metadata": {},
179+
"source": [
180+
"## Substitution"
181+
]
182+
},
183+
{
184+
"cell_type": "markdown",
185+
"metadata": {},
186+
"source": [
187+
"Regular expressions can also be used to substitute parts of strings that match a given pattern. For instance, replace all extensions in file names by `.txt`."
188+
]
189+
},
190+
{
191+
"cell_type": "code",
192+
"execution_count": null,
193+
"metadata": {
194+
"collapsed": false
195+
},
196+
"outputs": [],
197+
"source": [
198+
"file_list = ['exp_01.dat', 'exp_03.txt', 'exp_02.dat', 'exp_10.text']\n",
199+
"for file_name in file_list:\n",
200+
" new_file_name = re.sub(r'\\.\\w+$', '.txt', file_name)\n",
201+
" print('{old:15s} -> {new}'.format(old=file_name, new=new_file_name))"
202+
]
203+
},
204+
{
205+
"cell_type": "markdown",
206+
"metadata": {},
207+
"source": [
208+
"The substituion can infact include part of the string captured in the regular expression. We can replace a file name such as `exp_03.txt` by `03_exp.txt`, and `dev_05.dat` by `05_dev.dat`."
209+
]
210+
},
211+
{
212+
"cell_type": "code",
213+
"execution_count": null,
214+
"metadata": {
215+
"collapsed": false
216+
},
217+
"outputs": [],
218+
"source": [
219+
"file_list = ['exp_01.dat', 'dev_03.txt', 'exp_02.txt', 'exp_10.text']\n",
220+
"for file_name in file_list:\n",
221+
" new_file_name = re.sub(r'(\\w+)_(\\d+)\\.', r'\\2_\\1.', file_name)\n",
222+
" print('{old:15s} -> {new}'.format(old=file_name, new=new_file_name))"
223+
]
224+
}
225+
],
226+
"metadata": {
227+
"kernelspec": {
228+
"display_name": "Python 3",
229+
"language": "python",
230+
"name": "python3"
231+
},
232+
"language_info": {
233+
"codemirror_mode": {
234+
"name": "ipython",
235+
"version": 3
236+
},
237+
"file_extension": ".py",
238+
"mimetype": "text/x-python",
239+
"name": "python",
240+
"nbconvert_exporter": "python",
241+
"pygments_lexer": "ipython3",
242+
"version": "3.5.1"
243+
}
244+
},
245+
"nbformat": 4,
246+
"nbformat_minor": 0
247+
}

0 commit comments

Comments
 (0)