Skip to content

Commit 445091a

Browse files
committed
Reorganization of the Beyond Numpy chapter
1 parent 59b1deb commit 445091a

File tree

4 files changed

+321
-19
lines changed

4 files changed

+321
-19
lines changed

book/beyond-numpy.rst

Lines changed: 308 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,309 @@ Beyond Numpy
77
Back to Python
88
--------------
99

10-
Cython vs Python
11-
----------------
10+
You've almost reached the end of the book and, hopefully, you've learned that
11+
Numpy is a very versatile and powerful library. However in the meantime, you've
12+
to remember that Python is also quite a powerful tool. In fact, in some few
13+
specific cases, it might be more powerful than Numpy. Let's consider for
14+
example an interesting exercise that has been proposed by Tucker Balch in his
15+
`Coursera's Computational Investing
16+
<https://www.coursera.org/learn/computational-investing>`_ course. The exercise
17+
can be written as:
18+
19+
Write the most succinct code possible to compute all "legal" allocations to 4
20+
stocks such that:
21+
22+
* The allocations are in 1.0 chunks, and the allocations sum to 10.0
23+
* Only "pure" NumPy is allowed (no external libraries)
24+
* Can you do it without a "for"?"
25+
26+
`Yaser Martinez <http://yasermartinez.com/blog/index.html>`_ collected the
27+
different answers from the community and the proposed solutions yield
28+
surprising results. But let's start with he most obvious Python solution:
29+
30+
.. code:: python
31+
32+
def solution_1():
33+
# Brute force
34+
# 14641 (=11*11*11*11) iterations & tests
35+
Z = []
36+
for i in range(11):
37+
for j in range(11):
38+
for k in range(11):
39+
for l in range(11):
40+
if i+j+k+l == 10:
41+
Z.append((i,j,k,l))
42+
return Z
43+
44+
This solution is the slowest solution because it requires 4 loops, and more
45+
importantly, it tests all the different combinations (11641) of 4 integers
46+
between 0 and 10 to retain only combinations whose sum is 10. We can of course
47+
get rid of the 4 loops using itertools, but the code remains slow:
48+
49+
.. code:: python
50+
51+
import itertools as it
52+
53+
def solution_2():
54+
# Itertools
55+
# 14641 (=11*11*11*11) iterations & tests
56+
return [(i,j,k,l)
57+
for i,j,k,l in it.product(range(11),repeat=4) if i+j+k+l == 10]
58+
59+
One of the best solution that has been proposed by Nick Popplas takes advantage
60+
of the fact we can have intelligent imbricated loops that will allow us to
61+
directly build each tuple without any test as shown below:
62+
63+
.. code:: python
64+
65+
def solution_3():
66+
return [(a, b, c, (10 - a - b - c))
67+
for a in range(11)
68+
for b in range(11 - a)
69+
for c in range(11 - a - b)]
70+
71+
The best numpy solution by Yaser Martinez uses a different strategy with a
72+
restriced set of tests:
73+
74+
.. code:: python
75+
76+
def solution_4():
77+
X123 = np.indices((11,11,11)).reshape(3,11*11*11)
78+
X4 = 10 - X123.sum(axis=0)
79+
return np.vstack((X123, X4)).T[X4 > -1]
80+
81+
If we benchmark these methods, we get:
82+
83+
.. code:: pycon
1284
13-
OpenGL made easy
85+
>>> timeit("solution_1()", globals())
86+
100 loops, best of 3: 1.9 msec per loop
87+
>>> timeit("solution_2()", globals())
88+
100 loops, best of 3: 1.67 msec per loop
89+
>>> timeit("solution_3()", globals())
90+
1000 loops, best of 3: 60.4 usec per loop
91+
>>> timeit("solution_4()", globals())
92+
1000 loops, best of 3: 54.4 usec per loop
93+
94+
The Numpy solution is the fastest but the pure Python solution is comparable.
95+
But let me introduce a small modification to the Python solution:
96+
97+
.. code:: python
98+
99+
def solution_3_bis():
100+
return ((a, b, c, (10 - a - b - c))
101+
for a in range(11)
102+
for b in range(11 - a)
103+
for c in range(11 - a - b))
104+
105+
If we benchmark it, we get:
106+
107+
.. code:: pycon
108+
109+
>>> timeit("solution_3_bis()", globals())
110+
10000 loops, best of 3: 0.643 usec per loop
111+
112+
You read it right, we have gained a factor 100 just by replacing square
113+
brackets with parenthesis. How is that possible ? The explanation can be found
114+
by looking at the type of the returned object:
115+
116+
.. code:: pycon
117+
118+
>>> print(type(solution_3()))
119+
<class 'list'>
120+
>>> print(type(solution_3_bis()))
121+
<class 'generator'>
122+
123+
The `solution_3_bis()` returns a generator that can be used to generate the
124+
full list or to iterate over all the different elements. In any case, the huge
125+
speedup comes from the non-instantiation of the full list and it is this
126+
important to wonder if you need an actual instance of your result or if a
127+
simple generator might do the job.
128+
129+
130+
Friends of Numpy
14131
----------------
15132

16-
Scikits
17-
-------
133+
Beyond numpy, there are several other Python packages that are worth a look
134+
because they address similar yet different class of problems using different
135+
technology (compilation, virtual machine, just in time compilation, GPU,
136+
compression, etc.). Depending on your specific problem and your hardware, one
137+
package may be better than the other. Let's illustrate their usage using a very
138+
simple example where we want to compute an expression based on two float
139+
vectors:
140+
141+
.. code:: python
142+
143+
import numpy as np
144+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
145+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
146+
c = 2*a + 3*b
147+
148+
149+
NumExpr
150+
+++++++
151+
152+
The `numexpr <https://github.com/pydata/numexpr/wiki/Numexpr-Users-Guide>`_
153+
package supplies routines for the fast evaluation of array expressions
154+
elementwise by using a vector-based virtual machine. It's comparable to SciPy's
155+
weave package, but doesn't require a separate compile step of C or C++ code.
156+
157+
.. code:: python
158+
159+
import numpy as np
160+
import numexpr as ne
161+
162+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
163+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
164+
c = ne.evaluate("2*a + 3*b")
165+
166+
167+
Cython
168+
++++++
169+
170+
`Cython <http://cython.org>`_ is an optimising static compiler for both the
171+
Python programming language and the extended Cython programming language (based
172+
on Pyrex). It makes writing C extensions for Python as easy as Python itself.
173+
174+
.. code:: python
175+
176+
import numpy as np
177+
178+
def evaluate(np.ndarray a, np.ndarray b):
179+
cdef int i
180+
cdef np.ndarray c = np.zeros_like(a)
181+
for i in range(a.size):
182+
c[i] = 2*a[i] + 3*b[i]
183+
return c
184+
185+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
186+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
187+
c = evaluate(a, b)
188+
189+
190+
Numba
191+
+++++
192+
193+
`Numba <http://numba.pydata.org>`_ gives you the power to speed up your
194+
applications with high performance functions written directly in Python. With a
195+
few annotations, array-oriented and math-heavy Python code can be just-in-time
196+
compiled to native machine instructions, similar in performance to C, C++ and
197+
Fortran, without having to switch languages or Python interpreters.
198+
199+
.. code:: python
200+
201+
from numba import jit
202+
import numpy as np
203+
204+
@jit
205+
def evaluate(np.ndarray a, np.ndarray b):
206+
c = np.zeros_like(a)
207+
for i in range(a.size):
208+
c[i] = 2*a[i] + 3*b[i]
209+
return c
210+
211+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
212+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
213+
c = evaluate(a, b)
214+
215+
216+
Theano
217+
++++++
218+
219+
`Theano <http://www.deeplearning.net/software/theano/>`_ is a Python library
220+
that allows you to define, optimize, and evaluate mathematical expressions
221+
involving multi-dimensional arrays efficiently. Theano features tight
222+
integration with NumPy, transparent use of a GPU, efficient symbolic
223+
differentiation, speed and stability optimizations, dynamic C code generation
224+
and extensive unit-testing and self-verification.
225+
226+
.. code:: python
227+
228+
import numpy as np
229+
import theano.tensor as T
230+
231+
x = T.fvector('x')
232+
y = T.fvector('y')
233+
z = 2*x + 3*y
234+
f = function([x, y], z)
235+
236+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
237+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
238+
c = f(a, b)
239+
240+
241+
PyCUDA
242+
++++++
243+
244+
`PyCUDA <http://mathema.tician.de/software/pycuda>`_ lets you access Nvidia's
245+
CUDA parallel computation API from Python.
246+
247+
.. code:: python
248+
249+
import numpy as np
250+
import pycuda.autoinit
251+
import pycuda.driver as drv
252+
from pycuda.compiler import SourceModule
253+
254+
mod = SourceModule("""
255+
__global__ void evaluate(float *c, float *a, float *b)
256+
{
257+
const int i = threadIdx.x;
258+
c[i] = 2*a[i] + 3*b[i];
259+
}
260+
""")
261+
262+
evaluate = mod.get_function("evaluate")
263+
264+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
265+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
266+
c = np.zeros_like(a)
267+
268+
evaluate(drv.Out(c), drv.In(a), drv.In(b),
269+
block=(400,1,1), grid=(1,1))
270+
271+
272+
PyOpenCL
273+
++++++++
274+
275+
`PyOpenCL <http://mathema.tician.de/software/pyopencl>`_ lets you access GPUs
276+
and other massively parallel compute devices from Python.
277+
278+
.. code:: python
279+
280+
import numpy as np
281+
import pyopencl as cl
282+
283+
a = np.random.uniform(0, 1, 1000).astype(np.float32)
284+
b = np.random.uniform(0, 1, 1000).astype(np.float32)
285+
c = np.empty_like(a)
286+
287+
ctx = cl.create_some_context()
288+
queue = cl.CommandQueue(ctx)
289+
290+
mf = cl.mem_flags
291+
gpu_a = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
292+
gpu_b = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b)
293+
294+
evaluate = cl.Program(ctx, """
295+
__kernel void evaluate(
296+
__global const float *gpu_a;
297+
__global const float *gpu_b;
298+
__global float *gpu_c)
299+
{
300+
int gid = get_global_id(0);
301+
gpu_c[gid] = 2*gpu_a[gid] + 3*gpu_b[gid];
302+
}
303+
""").build()
304+
305+
gpu_c = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)
306+
evaluate.evaluate(queue, a.shape, None, gpu_a, gpu_b, gpu_c)
307+
cl.enqueue_copy(queue, c, gpu_c)
308+
309+
310+
311+
Scipy & friends
312+
---------------
18313

19314
Here is a very short list of packages that are well-maintained, well tested and
20315
may simplify your scientific life (depending on your domain). There are of
@@ -57,13 +352,13 @@ some spare time. For an extensive list, have a look at the `Awesome python list
57352
Conclusion
58353
----------
59354

60-
If numpy is a very versatile library, it does not mean you have to use in every
61-
situation. In this chapter, we've seen some alternatives (including Python
62-
itself) that are worth a look. As always, the choice belongs to you and you
355+
Numpy is a very versatile library but still, it does not mean you have to use
356+
it in every situation. In this chapter, we've seen some alternatives (including
357+
Python itself) that are worth a look. As always, the choice belongs to you. You
63358
have to consider what is the best solution for you in term of development time,
64-
computation time and effort in maintenance. If you design you own solution,
65-
you'll have to test it and to maintain it but in exchange, you're free to
66-
design it the way you want. On the other side, if you decide to rely on a
67-
third-party package, you'll save time in development and benefit from
68-
community-support even though you might have to adapt the package to your
359+
computation time and effort in maintenance. In onen hand, if you design your
360+
own solution, you'll have to test it and to maintain it, but in exchange,
361+
you'll be free to design it the way you want. On the other hand, if you decide
362+
to rely on a third-party package, you'll save time in development and benefit
363+
from community-support even though you might have to adapt the package to your
69364
specific needs. The choice is up to you.

book/preface.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,15 @@ The sources are available from https://github.com/rougier/numpy-book.
5050

5151

5252
Pre-requisites
53-
--------------
53+
++++++++++++++
5454

5555
This is not a Python beginner guide and you should have an intermediate level in
5656
Python and ideally a beginner level in Numpy. If this is not the case, have
5757
a look at the bibliography_ for a curated list of resources.
5858

5959

6060
Conventions
61-
-----------
61+
+++++++++++
6262

6363
We will use usual naming conventions. If not stated explicitely, each script
6464
should import numpy, scipy and matplotlib as:
@@ -86,7 +86,7 @@ Matplotlib 1.5.3
8686
=========== =========
8787

8888
How to contribute
89-
-----------------
89+
+++++++++++++++++
9090

9191
If you want to contribute to this book, you can do it by:
9292

book/problem-vectorization.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ Path finding
173173
propagated values from the end-point of the maze (bottom-right). Path is
174174
found by ascending gradient from the goal.
175175

176-
177176
.. image:: ../data/maze.png
178177
:width: 100%
179178

@@ -187,10 +186,13 @@ Bellman-Ford method
187186
Sources
188187
+++++++
189188

189+
* `maze-build.py <../code/maze-build.py>`_
190+
* `maze-python.py <../code/maze-numpy.py>`_
191+
* `maze-numpy.py <../code/maze-numpy.py>`_
192+
190193
References
191194
++++++++++
192195

193-
194196
Smoke simulation
195197
----------------
196198

0 commit comments

Comments
 (0)