@@ -7,14 +7,309 @@ Beyond Numpy
77Back to Python
88--------------
99
10- Cython vs Python
11- ----------------
10+ You've almost reached the end of the book and, hopefully, you've learned that
11+ Numpy is a very versatile and powerful library. However in the meantime, you've
12+ to remember that Python is also quite a powerful tool. In fact, in some few
13+ specific cases, it might be more powerful than Numpy. Let's consider for
14+ example an interesting exercise that has been proposed by Tucker Balch in his
15+ `Coursera's Computational Investing
16+ <https://www.coursera.org/learn/computational-investing> `_ course. The exercise
17+ can be written as:
18+
19+ Write the most succinct code possible to compute all "legal" allocations to 4
20+ stocks such that:
21+
22+ * The allocations are in 1.0 chunks, and the allocations sum to 10.0
23+ * Only "pure" NumPy is allowed (no external libraries)
24+ * Can you do it without a "for"?"
25+
26+ `Yaser Martinez <http://yasermartinez.com/blog/index.html >`_ collected the
27+ different answers from the community and the proposed solutions yield
28+ surprising results. But let's start with he most obvious Python solution:
29+
30+ .. code :: python
31+
32+ def solution_1 ():
33+ # Brute force
34+ # 14641 (=11*11*11*11) iterations & tests
35+ Z = []
36+ for i in range (11 ):
37+ for j in range (11 ):
38+ for k in range (11 ):
39+ for l in range (11 ):
40+ if i+ j+ k+ l == 10 :
41+ Z.append((i,j,k,l))
42+ return Z
43+
44+ This solution is the slowest solution because it requires 4 loops, and more
45+ importantly, it tests all the different combinations (11641) of 4 integers
46+ between 0 and 10 to retain only combinations whose sum is 10. We can of course
47+ get rid of the 4 loops using itertools, but the code remains slow:
48+
49+ .. code :: python
50+
51+ import itertools as it
52+
53+ def solution_2 ():
54+ # Itertools
55+ # 14641 (=11*11*11*11) iterations & tests
56+ return [(i,j,k,l)
57+ for i,j,k,l in it.product(range (11 ),repeat = 4 ) if i+ j+ k+ l == 10 ]
58+
59+ One of the best solution that has been proposed by Nick Popplas takes advantage
60+ of the fact we can have intelligent imbricated loops that will allow us to
61+ directly build each tuple without any test as shown below:
62+
63+ .. code :: python
64+
65+ def solution_3 ():
66+ return [(a, b, c, (10 - a - b - c))
67+ for a in range (11 )
68+ for b in range (11 - a)
69+ for c in range (11 - a - b)]
70+
71+ The best numpy solution by Yaser Martinez uses a different strategy with a
72+ restriced set of tests:
73+
74+ .. code :: python
75+
76+ def solution_4 ():
77+ X123 = np.indices((11 ,11 ,11 )).reshape(3 ,11 * 11 * 11 )
78+ X4 = 10 - X123.sum(axis = 0 )
79+ return np.vstack((X123, X4)).T[X4 > - 1 ]
80+
81+ If we benchmark these methods, we get:
82+
83+ .. code :: pycon
1284
13- OpenGL made easy
85+ >>> timeit("solution_1()", globals())
86+ 100 loops, best of 3: 1.9 msec per loop
87+ >>> timeit("solution_2()", globals())
88+ 100 loops, best of 3: 1.67 msec per loop
89+ >>> timeit("solution_3()", globals())
90+ 1000 loops, best of 3: 60.4 usec per loop
91+ >>> timeit("solution_4()", globals())
92+ 1000 loops, best of 3: 54.4 usec per loop
93+
94+ The Numpy solution is the fastest but the pure Python solution is comparable.
95+ But let me introduce a small modification to the Python solution:
96+
97+ .. code :: python
98+
99+ def solution_3_bis ():
100+ return ((a, b, c, (10 - a - b - c))
101+ for a in range (11 )
102+ for b in range (11 - a)
103+ for c in range (11 - a - b))
104+
105+ If we benchmark it, we get:
106+
107+ .. code :: pycon
108+
109+ >>> timeit("solution_3_bis()", globals())
110+ 10000 loops, best of 3: 0.643 usec per loop
111+
112+ You read it right, we have gained a factor 100 just by replacing square
113+ brackets with parenthesis. How is that possible ? The explanation can be found
114+ by looking at the type of the returned object:
115+
116+ .. code :: pycon
117+
118+ >>> print(type(solution_3()))
119+ <class 'list'>
120+ >>> print(type(solution_3_bis()))
121+ <class 'generator'>
122+
123+ The `solution_3_bis() ` returns a generator that can be used to generate the
124+ full list or to iterate over all the different elements. In any case, the huge
125+ speedup comes from the non-instantiation of the full list and it is this
126+ important to wonder if you need an actual instance of your result or if a
127+ simple generator might do the job.
128+
129+
130+ Friends of Numpy
14131----------------
15132
16- Scikits
17- -------
133+ Beyond numpy, there are several other Python packages that are worth a look
134+ because they address similar yet different class of problems using different
135+ technology (compilation, virtual machine, just in time compilation, GPU,
136+ compression, etc.). Depending on your specific problem and your hardware, one
137+ package may be better than the other. Let's illustrate their usage using a very
138+ simple example where we want to compute an expression based on two float
139+ vectors:
140+
141+ .. code :: python
142+
143+ import numpy as np
144+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
145+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
146+ c = 2 * a + 3 * b
147+
148+
149+ NumExpr
150+ +++++++
151+
152+ The `numexpr <https://github.com/pydata/numexpr/wiki/Numexpr-Users-Guide >`_
153+ package supplies routines for the fast evaluation of array expressions
154+ elementwise by using a vector-based virtual machine. It's comparable to SciPy's
155+ weave package, but doesn't require a separate compile step of C or C++ code.
156+
157+ .. code :: python
158+
159+ import numpy as np
160+ import numexpr as ne
161+
162+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
163+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
164+ c = ne.evaluate(" 2*a + 3*b" )
165+
166+
167+ Cython
168+ ++++++
169+
170+ `Cython <http://cython.org >`_ is an optimising static compiler for both the
171+ Python programming language and the extended Cython programming language (based
172+ on Pyrex). It makes writing C extensions for Python as easy as Python itself.
173+
174+ .. code :: python
175+
176+ import numpy as np
177+
178+ def evaluate (np.ndarray a , np.ndarray b ):
179+ cdef int i
180+ cdef np.ndarray c = np.zeros_like(a)
181+ for i in range (a.size):
182+ c[i] = 2 * a[i] + 3 * b[i]
183+ return c
184+
185+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
186+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
187+ c = evaluate(a, b)
188+
189+
190+ Numba
191+ +++++
192+
193+ `Numba <http://numba.pydata.org >`_ gives you the power to speed up your
194+ applications with high performance functions written directly in Python. With a
195+ few annotations, array-oriented and math-heavy Python code can be just-in-time
196+ compiled to native machine instructions, similar in performance to C, C++ and
197+ Fortran, without having to switch languages or Python interpreters.
198+
199+ .. code :: python
200+
201+ from numba import jit
202+ import numpy as np
203+
204+ @jit
205+ def evaluate (np.ndarray a , np.ndarray b ):
206+ c = np.zeros_like(a)
207+ for i in range (a.size):
208+ c[i] = 2 * a[i] + 3 * b[i]
209+ return c
210+
211+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
212+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
213+ c = evaluate(a, b)
214+
215+
216+ Theano
217+ ++++++
218+
219+ `Theano <http://www.deeplearning.net/software/theano/ >`_ is a Python library
220+ that allows you to define, optimize, and evaluate mathematical expressions
221+ involving multi-dimensional arrays efficiently. Theano features tight
222+ integration with NumPy, transparent use of a GPU, efficient symbolic
223+ differentiation, speed and stability optimizations, dynamic C code generation
224+ and extensive unit-testing and self-verification.
225+
226+ .. code :: python
227+
228+ import numpy as np
229+ import theano.tensor as T
230+
231+ x = T.fvector(' x' )
232+ y = T.fvector(' y' )
233+ z = 2 * x + 3 * y
234+ f = function([x, y], z)
235+
236+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
237+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
238+ c = f(a, b)
239+
240+
241+ PyCUDA
242+ ++++++
243+
244+ `PyCUDA <http://mathema.tician.de/software/pycuda >`_ lets you access Nvidia's
245+ CUDA parallel computation API from Python.
246+
247+ .. code :: python
248+
249+ import numpy as np
250+ import pycuda.autoinit
251+ import pycuda.driver as drv
252+ from pycuda.compiler import SourceModule
253+
254+ mod = SourceModule("""
255+ __global__ void evaluate(float *c, float *a, float *b)
256+ {
257+ const int i = threadIdx.x;
258+ c[i] = 2*a[i] + 3*b[i];
259+ }
260+ """ )
261+
262+ evaluate = mod.get_function(" evaluate" )
263+
264+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
265+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
266+ c = np.zeros_like(a)
267+
268+ evaluate(drv.Out(c), drv.In(a), drv.In(b),
269+ block = (400 ,1 ,1 ), grid = (1 ,1 ))
270+
271+
272+ PyOpenCL
273+ ++++++++
274+
275+ `PyOpenCL <http://mathema.tician.de/software/pyopencl >`_ lets you access GPUs
276+ and other massively parallel compute devices from Python.
277+
278+ .. code :: python
279+
280+ import numpy as np
281+ import pyopencl as cl
282+
283+ a = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
284+ b = np.random.uniform(0 , 1 , 1000 ).astype(np.float32)
285+ c = np.empty_like(a)
286+
287+ ctx = cl.create_some_context()
288+ queue = cl.CommandQueue(ctx)
289+
290+ mf = cl.mem_flags
291+ gpu_a = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR , hostbuf = a)
292+ gpu_b = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR , hostbuf = b)
293+
294+ evaluate = cl.Program(ctx, """
295+ __kernel void evaluate(
296+ __global const float *gpu_a;
297+ __global const float *gpu_b;
298+ __global float *gpu_c)
299+ {
300+ int gid = get_global_id(0);
301+ gpu_c[gid] = 2*gpu_a[gid] + 3*gpu_b[gid];
302+ }
303+ """ ).build()
304+
305+ gpu_c = cl.Buffer(ctx, mf.WRITE_ONLY , a.nbytes)
306+ evaluate.evaluate(queue, a.shape, None , gpu_a, gpu_b, gpu_c)
307+ cl.enqueue_copy(queue, c, gpu_c)
308+
309+
310+
311+ Scipy & friends
312+ ---------------
18313
19314Here is a very short list of packages that are well-maintained, well tested and
20315may simplify your scientific life (depending on your domain). There are of
@@ -57,13 +352,13 @@ some spare time. For an extensive list, have a look at the `Awesome python list
57352Conclusion
58353----------
59354
60- If numpy is a very versatile library, it does not mean you have to use in every
61- situation. In this chapter, we've seen some alternatives (including Python
62- itself) that are worth a look. As always, the choice belongs to you and you
355+ Numpy is a very versatile library but still , it does not mean you have to use
356+ it in every situation. In this chapter, we've seen some alternatives (including
357+ Python itself) that are worth a look. As always, the choice belongs to you. You
63358have to consider what is the best solution for you in term of development time,
64- computation time and effort in maintenance. If you design you own solution,
65- you'll have to test it and to maintain it but in exchange, you're free to
66- design it the way you want. On the other side , if you decide to rely on a
67- third-party package, you'll save time in development and benefit from
68- community-support even though you might have to adapt the package to your
359+ computation time and effort in maintenance. In onen hand, if you design your
360+ own solution, you'll have to test it and to maintain it, but in exchange,
361+ you'll be free to design it the way you want. On the other hand , if you decide
362+ to rely on a third-party package, you'll save time in development and benefit
363+ from community-support even though you might have to adapt the package to your
69364specific needs. The choice is up to you.
0 commit comments