11.. _cython :
22
3- Cython
4- ======
3+ Extending Python with Cython
4+ ============================
55
66.. questions ::
77
8- - Q1
9- - Q2
8+ - How does runtime performance of Python compare to languages like C, C++
9+ or Fortran?
10+ - How do we use code written in other languages from within Python? In what
11+ situations is this useful?
12+
13+
1014.. objectives ::
1115
12- - O1
13- - O2
16+ - Understand how compiled extension modules can speed up code execution.
17+ - Understand the basics of Cython.
1418
1519
1620.. callout ::
@@ -23,10 +27,13 @@ Cython
2327 with `conda install cython `.
2428
2529
30+ Python and performance
31+ ----------------------
32+
2633Interpreted languages like Python are rather slow to execute compared to
27- languages like C or Fortran that are compiled to machine code before execution.
28- Python in particular is both strongly typed and dynamically typed: this means
29- that all variables have a type that matters for operations that
34+ languages like C or Fortran that are compiled to machine code ahead of
35+ execution. Python in particular is both strongly typed and dynamically typed:
36+ this means that all variables have a type that matters for operations that
3037can be performed on the variable, and that the type is determined only during
3138runtime by the Python interpreter. The interpreter does a lot of
3239"unboxing" of variable types when performing operations, and this comes with
@@ -60,8 +67,20 @@ Scientific programs often include computationally expensive sections (e.g.
6067simulations of any kind). So how do we make Python execute our code faster in
6168these situations? Well that's the neat part: we don't! Instead, we write the
6269performance critical parts in a faster language and make them usable from
63- Python. This is called extending Python, and usually involves writing C-code
64- with Python-specific boilerplate and compiling this as a shared library.
70+ Python.
71+
72+ This is called extending Python, and usually involves writing C-code
73+ with Python-specific boilerplate and compiling this as a shared library, which
74+ in this context is called a **Python extension module **.
75+ Most scientific Python libraries (Numpy, Scipy etc) do exactly this: their
76+ computationally intensive parts are either written in a compiled language,
77+ or they call an external library written in such language.
78+
79+ When working on your own Python project, you may find that there is a C
80+ library that does exactly what you need, but it doesn't provide a Python
81+ interface. Or you may have computationally intensive code that doesn't
82+ vectorize nicely for Numpy. In cases like these it can be useful to write
83+ your own extension modules that you then import into your Python code.
6584
6685Here we discuss one popular approach for extending Python with compiled code:
6786using a tool called Cython.
7392that can be processed with the Cython compiler to produce optimized code.
7493Cython is designed to provide C-like performance for code that is mostly
7594written in Python by adding only a few C-like declarations to existing
76- Python code. As such, Cython provides the best of the both worlds:
95+ Python code. As such, Cython aims to provide the best of the both worlds:
7796the good programmer productivity of Python together with the high performance
7897of C. Cython also makes it easy to interact with external C/C++ code.
7998
@@ -147,15 +166,14 @@ Cythonized before use.
147166 instead use an established build tool like **setuptools ** to handle the
148167 Cythonization during the project's build phase. More info is available on
149168 the `Cython documentation <https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#compilation >`__.
150- See also the course page on packaging. (TODO: link.)
169+ See also the :doc: ` course page on packaging < packaging >`.
151170
152171
153172Using Cython with Jupyter
154173-------------------------
155174
156- Jupyter has an `extension <https://cython.readthedocs.io/en/latest/src/quickstart/build.html#using-the-jupyter-notebook> `
157- for supporting Cython compilation directly inside notebooks, assuming your
158- environment has Cython installed.
175+ Jupyter supports Cython compilation directly inside notebooks via `an extension <https://cython.readthedocs.io/en/latest/src/quickstart/build.html#using-the-jupyter-notebook >`__,
176+ assuming your environment has Cython installed.
159177
160178We first load the Cython extension, e.g. in the very first cell: ::
161179
@@ -214,11 +232,11 @@ Cythonize as before:
214232 Import this into Python and confirm that it works as expected with integers.
215233However, if passing floating-point numbers the function is forced to interpret
216234the inputs as integers before performing the addition. For example,
217- **add(1.2 , 2.7) ** would return 3. This happens because there is an automatic
218- conversion from the input Python objects (floating point numbers) to the
219- declared C-types when calling the Cythonized function from Python.
220- Similarly the returned C variable is converted to a corresponding Python
221- object.
235+ **add(1.4 , 2.7) ** would return 3. This happens because there is an automatic
236+ conversion from the input Python objects to the
237+ declared C-types, in this case integers, when calling the Cythonized function
238+ from Python. Similarly the returned C variable is converted to a corresponding
239+ Python object.
222240
223241To make the function work with floats we'd instead declare the types to be
224242either **float ** (32-bit) or **double ** (64-bit) type instead of **int **.
@@ -247,11 +265,11 @@ Using Numpy arrays with Cython
247265
248266Cython has built-in support for Numpy arrays.
249267
250- As discussed in the Numpy lectures (TODO: LINK) , Numpy arrays provide great performance
251- for vectorized operations. In contrast, thing like ** for **-loops over Numpy
252- arrays should be avoided because of interpreting overhead inherent to Python
253- **for **-loops. There is also overhead from accessing individual elements of
254- Numpy arrays.
268+ As discussed in the :doc: ` Numpy lectures < numpy-advanced >` , Numpy arrays provide
269+ great performance for vectorized operations. In contrast, thing like
270+ ** for **-loops over Numpy arrays should be avoided because of interpreting
271+ overhead inherent to Python **for **-loops. There is also overhead from
272+ accessing individual elements of Numpy arrays.
255273
256274With Cython we can bypass both restrictions and write efficient loops over
257275Numpy arrays. Consider e.g. a double loop that sets values of a 2D array:
@@ -280,7 +298,7 @@ nice, but we are still bottlenecked by array lookups and assignments, i.e. the
280298We can get a huge speedup by adding a static type declaration for the Numpy
281299array, and for the other variables too while we are at it. To do this we must
282300import compile-time information about the Numpy module using the
283- Cython-specific ` cimport ` keyword, then use Cython's Numpy interface to
301+ Cython-specific ** cimport ** keyword, then use Cython's Numpy interface to
284302declare the array's datatype and dimensions:
285303
286304.. code :: python
@@ -291,7 +309,7 @@ declare the array's datatype and dimensions:
291309 def fast_looper (int N ):
292310 """ """
293311
294- # Static declaration: 2D array of integers
312+ # Type declaration: 2D array of 32-bit integers
295313 cdef cnp.ndarray[cnp.int32_t, ndim=2 ] data
296314 data = np.empty((N, N), dtype = np.int32)
297315
@@ -313,7 +331,7 @@ pure Python implementation!
313331 in Python distributions. This usually works out of the box for Jupyter
314332 notebooks. However, if using the command line `cythonize ` tool you may need
315333 to manually set include paths for the C compiler knows where to find the
316- headers. Refer to `the docs <https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#compilation>__ `
334+ headers. Refer to `the docs <https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#compilation >`__
317335 for more details.
318336
319337.. callout ::
@@ -378,8 +396,7 @@ Further reading
378396---------------
379397
380398- Newer usage of Numpy arrays (memory views) https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#numpy-tutorial
381- - cpdef keyword for functions
382-
399+ - TODO
383400
384401
385402Summary
0 commit comments