11Introduction
22===============================================================================
33
4- Numpy is all about vectorization.
4+ Numpy is all about vectorization. If you are familiar with Python, this is the
5+ main difficulty you'll face because you'll need to change your way of thinking
6+ and your new friends (among others) are named "vectors", "arrays", "views" or
7+ "ufuncs".
58
6- If you are familiar with Python, this is the main difficulty you'll face
7- because it requires for you to change your way of thinking and your new friends
8- are named vectors, arrays, views or ufuncs.
99
10- Let's take a very simple example: random walk. One possible object oriented
11- approach would be to define a `RandomWalker ` class and to write with a walk
12- method that would return current position after each (random) steps. It's nice,
13- but is is slow:
10+ Simple example
11+ --------------
12+
13+ Let's take a very simple example, random walk.
14+
15+ One possible object oriented approach would be to define a `RandomWalker ` class
16+ and to write with a walk method that would return current position after each
17+ (random) steps. It's nice, it's readable, but it is slow:
1418
1519**Object oriented approach **
1620
1721.. code :: python
1822
1923 class RandomWalker :
20- def __init__ (self ):
21- self .steps = []
22- self .position = 0
23-
24- def walk (self , n ):
25- yield self .position
26- for i in range (n):
27- step = 2 * random.randint(0 , 1 ) - 1
28- self .position += step
29- yield self .position
24+ def __init__ (self ):
25+ self .position = 0
26+
27+ def walk (self , n ):
28+ self .position = 0
29+ for i in range (n):
30+ yield self .position
31+ self .position += 2 * random.randint(0 , 1 ) - 1
3032
3133 walker = RandomWalker()
32- walk = []
33- for position in walker.walk( 1000 ):
34- walk.append(position)
34+ walk = [position for position in walker.walk( 1000 ) ]
35+
36+ Benchmarking gives us:
3537
38+ .. code :: pycon
39+
40+ >>> from tools import timeit
41+ >>> walker = RandomWalker()
42+ >>> timeit("[position for position in walker.walk(n=10000)]", globals())
43+ 10 loops, best of 3: 15.7 msec per loop
3644
3745
3846 **Functional approach **
@@ -47,23 +55,98 @@ each random steps.
4755 position = 0
4856 walk = [position]
4957 for i in range (n):
50- step = 2 * random.randint(0 , 1 )- 1
51- position += step
58+ position += 2 * random.randint(0 , 1 )- 1
5259 walk.append(position)
5360 return walk
5461
5562 walk = random_walk(1000 )
5663
64+ This new method saves some CPU cycles but not that much because this function
65+ is pretty much the same as in the object-oriented approach and the few cycles
66+ we saved probably come from the inner Python object-oriented machinery.
67+
68+ .. code :: pycon
69+
70+ >>> from tools import timeit
71+ >>> timeit("random_walk(n=10000)]", globals())
72+ 10 loops, best of 3: 15.6 msec per loop
73+
74+
5775 **Vectorized approach **
76+
77+ But we can do better using the `itertools
78+ <https://docs.python.org/3.6/library/itertools.html> `_ Python module that
79+ offers a set of functions creating iterators for efficient looping. If we
80+ observe that a random walk is an accumulation of steps, we can rewrite the
81+ function as:
82+
83+ .. code :: python
84+
85+ def random_walk_faster (n = 1000 ):
86+ from itertools import accumulate
87+ steps = random.sample([1 , - 1 ]* n, n)
88+ return list (accumulate(steps))
89+
90+ In fact, we've just *vectorized * our function. Instead of looping for picking
91+ sequential steps and add them to the current position, we fist generate all the
92+ steps at once and use the `accumulate
93+ <https://docs.python.org/3.6/library/itertools.html#itertools.accumulate> `_
94+ function to compute all the positions. We get rid of the loop and this makes
95+ things faster:
5896
59- But, we can further simplify things by considering a random walk to be composed
60- of a number of steps and corresponding positions are the cumulative sum of
61- these steps.
97+ .. code :: pycon
98+
99+ >>> from tools import timeit
100+ >>> timeit("random_walk_faster(n=10000)]", globals())
101+ 10 loops, best of 3: 8.21 msec per loop
102+
103+ We gained 50% of computation-time compared to the previous version which is
104+ already good, but this new version makes numpy vectorization super simple, we
105+ just have to translate it into numpy methods.
62106
63107.. code :: python
64108
65- steps = 2 * np.random.randint(0 , 2 , size = n) - 1
66- walk = np.cumsum(steps)
109+ def random_walk_fastest (n = 1000 ):
110+ steps = 2 * np.random.randint(0 , 2 , size = 1000 ) - 1
111+ return np.cumsum(steps)
112+
113+ Not too difficult, but we gained a factor 500x using numpy:
114+
115+ .. code :: pycon
116+
117+ >>> from tools import timeit
118+ >>> timeit("random_walk_faster(n=10000)]", globals())
119+ 1000 loops, best of 3: 14 usec per loop
120+
67121
68122 This book is about vectorization, be it at the level of code or problem. We'll
69- see the difference is important before looking at custom vectorization.
123+ see this difference is important before looking at custom vectorization.
124+
125+
126+ Readability vs speed
127+ --------------------
128+
129+ Before heading to the next chapter, I would like to warn you about a potential
130+ problem you may encounter once you'll have become familiar with numpy. It is a
131+ very powerful library and you can make wonders with it but, most of the time,
132+ this comes at the price of readability. If you don't comment your code at the
133+ time of writing, you'll be unable to guess what a function is doing after a few
134+ weeks (or even days). For example, can you tell what the two functions below
135+ are doing? Probably you can tell for the first one, but unlikely for the second
136+ (or you don't need to read this book).
137+
138+ .. code :: python
139+
140+ def function_1 (seq , sub ):
141+ return [i for i in range (len (seq) - len (sub)) if seq[i:i+ len (sub)] == sub]
142+
143+ def function_2 (seq , sub ):
144+ target = np.dot(sub, sub)
145+ candidates = np.where(np.correlate(seq, sub, mode = ' valid' ) == target)[0 ]
146+ check = candidates[:, np.newaxis] + np.arange(len (sub))
147+ mask = np.all((np.take(seq, check) == sub), axis = - 1 )
148+ return candidates[mask]
149+
150+ As you may have guessed, the second function is the
151+ vectorized-optimized-faster-numpy version of the first function. It is 10 times
152+ faster than the pure Python version, but it is hardly readable.
0 commit comments