Skip to content

Commit 54cd679

Browse files
committed
Updated introduction
1 parent 0912162 commit 54cd679

File tree

4 files changed

+377
-203
lines changed

4 files changed

+377
-203
lines changed

02-introduction.rst

Lines changed: 112 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,46 @@
11
Introduction
22
===============================================================================
33

4-
Numpy is all about vectorization.
4+
Numpy is all about vectorization. If you are familiar with Python, this is the
5+
main difficulty you'll face because you'll need to change your way of thinking
6+
and your new friends (among others) are named "vectors", "arrays", "views" or
7+
"ufuncs".
58

6-
If you are familiar with Python, this is the main difficulty you'll face
7-
because it requires for you to change your way of thinking and your new friends
8-
are named vectors, arrays, views or ufuncs.
99

10-
Let's take a very simple example: random walk. One possible object oriented
11-
approach would be to define a `RandomWalker` class and to write with a walk
12-
method that would return current position after each (random) steps. It's nice,
13-
but is is slow:
10+
Simple example
11+
--------------
12+
13+
Let's take a very simple example, random walk.
14+
15+
One possible object oriented approach would be to define a `RandomWalker` class
16+
and to write with a walk method that would return current position after each
17+
(random) steps. It's nice, it's readable, but it is slow:
1418

1519
**Object oriented approach**
1620

1721
.. code:: python
1822
1923
class RandomWalker:
20-
def __init__(self):
21-
self.steps = []
22-
self.position = 0
23-
24-
def walk(self, n):
25-
yield self.position
26-
for i in range(n):
27-
step = 2*random.randint(0, 1) - 1
28-
self.position += step
29-
yield self.position
24+
def __init__(self):
25+
self.position = 0
26+
27+
def walk(self, n):
28+
self.position = 0
29+
for i in range(n):
30+
yield self.position
31+
self.position += 2*random.randint(0, 1) - 1
3032
3133
walker = RandomWalker()
32-
walk = []
33-
for position in walker.walk(1000):
34-
walk.append(position)
34+
walk = [position for position in walker.walk(1000)]
35+
36+
Benchmarking gives us:
3537

38+
.. code:: pycon
39+
40+
>>> from tools import timeit
41+
>>> walker = RandomWalker()
42+
>>> timeit("[position for position in walker.walk(n=10000)]", globals())
43+
10 loops, best of 3: 15.7 msec per loop
3644
3745
3846
**Functional approach**
@@ -47,23 +55,98 @@ each random steps.
4755
position = 0
4856
walk = [position]
4957
for i in range(n):
50-
step = 2*random.randint(0, 1)-1
51-
position += step
58+
position += 2*random.randint(0, 1)-1
5259
walk.append(position)
5360
return walk
5461
5562
walk = random_walk(1000)
5663
64+
This new method saves some CPU cycles but not that much because this function
65+
is pretty much the same as in the object-oriented approach and the few cycles
66+
we saved probably come from the inner Python object-oriented machinery.
67+
68+
.. code:: pycon
69+
70+
>>> from tools import timeit
71+
>>> timeit("random_walk(n=10000)]", globals())
72+
10 loops, best of 3: 15.6 msec per loop
73+
74+
5775
**Vectorized approach**
76+
77+
But we can do better using the `itertools
78+
<https://docs.python.org/3.6/library/itertools.html>`_ Python module that
79+
offers a set of functions creating iterators for efficient looping. If we
80+
observe that a random walk is an accumulation of steps, we can rewrite the
81+
function as:
82+
83+
.. code:: python
84+
85+
def random_walk_faster(n=1000):
86+
from itertools import accumulate
87+
steps = random.sample([1, -1]*n, n)
88+
return list(accumulate(steps))
89+
90+
In fact, we've just *vectorized* our function. Instead of looping for picking
91+
sequential steps and add them to the current position, we fist generate all the
92+
steps at once and use the `accumulate
93+
<https://docs.python.org/3.6/library/itertools.html#itertools.accumulate>`_
94+
function to compute all the positions. We get rid of the loop and this makes
95+
things faster:
5896

59-
But, we can further simplify things by considering a random walk to be composed
60-
of a number of steps and corresponding positions are the cumulative sum of
61-
these steps.
97+
.. code:: pycon
98+
99+
>>> from tools import timeit
100+
>>> timeit("random_walk_faster(n=10000)]", globals())
101+
10 loops, best of 3: 8.21 msec per loop
102+
103+
We gained 50% of computation-time compared to the previous version which is
104+
already good, but this new version makes numpy vectorization super simple, we
105+
just have to translate it into numpy methods.
62106

63107
.. code:: python
64108
65-
steps = 2*np.random.randint(0, 2, size=n) - 1
66-
walk = np.cumsum(steps)
109+
def random_walk_fastest(n=1000):
110+
steps = 2*np.random.randint(0, 2, size=1000) - 1
111+
return np.cumsum(steps)
112+
113+
Not too difficult, but we gained a factor 500x using numpy:
114+
115+
.. code:: pycon
116+
117+
>>> from tools import timeit
118+
>>> timeit("random_walk_faster(n=10000)]", globals())
119+
1000 loops, best of 3: 14 usec per loop
120+
67121
68122
This book is about vectorization, be it at the level of code or problem. We'll
69-
see the difference is important before looking at custom vectorization.
123+
see this difference is important before looking at custom vectorization.
124+
125+
126+
Readability vs speed
127+
--------------------
128+
129+
Before heading to the next chapter, I would like to warn you about a potential
130+
problem you may encounter once you'll have become familiar with numpy. It is a
131+
very powerful library and you can make wonders with it but, most of the time,
132+
this comes at the price of readability. If you don't comment your code at the
133+
time of writing, you'll be unable to guess what a function is doing after a few
134+
weeks (or even days). For example, can you tell what the two functions below
135+
are doing? Probably you can tell for the first one, but unlikely for the second
136+
(or you don't need to read this book).
137+
138+
.. code:: python
139+
140+
def function_1(seq, sub):
141+
return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]
142+
143+
def function_2(seq, sub):
144+
target = np.dot(sub, sub)
145+
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
146+
check = candidates[:, np.newaxis] + np.arange(len(sub))
147+
mask = np.all((np.take(seq, check) == sub), axis=-1)
148+
return candidates[mask]
149+
150+
As you may have guessed, the second function is the
151+
vectorized-optimized-faster-numpy version of the first function. It is 10 times
152+
faster than the pure Python version, but it is hardly readable.

06-custom-vectorization.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,12 @@ Sources
442442
.. Double precision array
443443
.. ----------------------
444444
.. https://www.thasler.com/blog/blog/glsl-part2-emu
445-
445+
.. http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
446+
.. T. J. Dekker, A floating point technique for extending the available precision.
447+
.. Numerische Mathematik, 18(3):224–242, 1971.
448+
.. Andrew Thall, Extended-Precision Floating-Point Numbers for GPU Computation
449+
.. SIGGRAPH, 2006 http://andrewthall.org/papers/df64_qf128.pdf
450+
446451
.. Single vs Double precision
447452
.. ++++++++++++++++++++++++++
448453

0 commit comments

Comments
 (0)