Skip to content

Commit e561111

Browse files
committed
More about Python extensions in general + fix links
1 parent 91a2f85 commit e561111

File tree

1 file changed

+48
-31
lines changed

1 file changed

+48
-31
lines changed

content/cython.rst

Lines changed: 48 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
.. _cython:
22

3-
Cython
4-
======
3+
Extending Python with Cython
4+
============================
55

66
.. questions::
77

8-
- Q1
9-
- Q2
8+
- How does runtime performance of Python compare to languages like C, C++
9+
or Fortran?
10+
- How do we use code written in other languages from within Python? In what
11+
situations is this useful?
12+
13+
1014
.. objectives::
1115

12-
- O1
13-
- O2
16+
- Understand how compiled extension modules can speed up code execution.
17+
- Understand the basics of Cython.
1418

1519

1620
.. callout::
@@ -23,10 +27,13 @@ Cython
2327
with `conda install cython`.
2428

2529

30+
Python and performance
31+
----------------------
32+
2633
Interpreted languages like Python are rather slow to execute compared to
27-
languages like C or Fortran that are compiled to machine code before execution.
28-
Python in particular is both strongly typed and dynamically typed: this means
29-
that all variables have a type that matters for operations that
34+
languages like C or Fortran that are compiled to machine code ahead of
35+
execution. Python in particular is both strongly typed and dynamically typed:
36+
this means that all variables have a type that matters for operations that
3037
can be performed on the variable, and that the type is determined only during
3138
runtime by the Python interpreter. The interpreter does a lot of
3239
"unboxing" of variable types when performing operations, and this comes with
@@ -60,8 +67,20 @@ Scientific programs often include computationally expensive sections (e.g.
6067
simulations of any kind). So how do we make Python execute our code faster in
6168
these situations? Well that's the neat part: we don't! Instead, we write the
6269
performance critical parts in a faster language and make them usable from
63-
Python. This is called extending Python, and usually involves writing C-code
64-
with Python-specific boilerplate and compiling this as a shared library.
70+
Python.
71+
72+
This is called extending Python, and usually involves writing C-code
73+
with Python-specific boilerplate and compiling this as a shared library, which
74+
in this context is called a **Python extension module**.
75+
Most scientific Python libraries (Numpy, Scipy etc) do exactly this: their
76+
computationally intensive parts are either written in a compiled language,
77+
or they call an external library written in such language.
78+
79+
When working on your own Python project, you may find that there is a C
80+
library that does exactly what you need, but it doesn't provide a Python
81+
interface. Or you may have computationally intensive code that doesn't
82+
vectorize nicely for Numpy. In cases like these it can be useful to write
83+
your own extension modules that you then import into your Python code.
6584

6685
Here we discuss one popular approach for extending Python with compiled code:
6786
using a tool called Cython.
@@ -73,7 +92,7 @@ Cython
7392
that can be processed with the Cython compiler to produce optimized code.
7493
Cython is designed to provide C-like performance for code that is mostly
7594
written in Python by adding only a few C-like declarations to existing
76-
Python code. As such, Cython provides the best of the both worlds:
95+
Python code. As such, Cython aims to provide the best of the both worlds:
7796
the good programmer productivity of Python together with the high performance
7897
of C. Cython also makes it easy to interact with external C/C++ code.
7998

@@ -147,15 +166,14 @@ Cythonized before use.
147166
instead use an established build tool like **setuptools** to handle the
148167
Cythonization during the project's build phase. More info is available on
149168
the `Cython documentation <https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#compilation>`__.
150-
See also the course page on packaging. (TODO: link.)
169+
See also the :doc:`course page on packaging <packaging>`.
151170

152171

153172
Using Cython with Jupyter
154173
-------------------------
155174

156-
Jupyter has an `extension <https://cython.readthedocs.io/en/latest/src/quickstart/build.html#using-the-jupyter-notebook>`
157-
for supporting Cython compilation directly inside notebooks, assuming your
158-
environment has Cython installed.
175+
Jupyter supports Cython compilation directly inside notebooks via `an extension <https://cython.readthedocs.io/en/latest/src/quickstart/build.html#using-the-jupyter-notebook>`__,
176+
assuming your environment has Cython installed.
159177

160178
We first load the Cython extension, e.g. in the very first cell: ::
161179

@@ -214,11 +232,11 @@ Cythonize as before:
214232
Import this into Python and confirm that it works as expected with integers.
215233
However, if passing floating-point numbers the function is forced to interpret
216234
the inputs as integers before performing the addition. For example,
217-
**add(1.2, 2.7)** would return 3. This happens because there is an automatic
218-
conversion from the input Python objects (floating point numbers) to the
219-
declared C-types when calling the Cythonized function from Python.
220-
Similarly the returned C variable is converted to a corresponding Python
221-
object.
235+
**add(1.4, 2.7)** would return 3. This happens because there is an automatic
236+
conversion from the input Python objects to the
237+
declared C-types, in this case integers, when calling the Cythonized function
238+
from Python. Similarly the returned C variable is converted to a corresponding
239+
Python object.
222240

223241
To make the function work with floats we'd instead declare the types to be
224242
either **float** (32-bit) or **double** (64-bit) type instead of **int**.
@@ -247,11 +265,11 @@ Using Numpy arrays with Cython
247265

248266
Cython has built-in support for Numpy arrays.
249267

250-
As discussed in the Numpy lectures (TODO: LINK), Numpy arrays provide great performance
251-
for vectorized operations. In contrast, thing like **for**-loops over Numpy
252-
arrays should be avoided because of interpreting overhead inherent to Python
253-
**for**-loops. There is also overhead from accessing individual elements of
254-
Numpy arrays.
268+
As discussed in the :doc:`Numpy lectures <numpy-advanced>`, Numpy arrays provide
269+
great performance for vectorized operations. In contrast, thing like
270+
**for**-loops over Numpy arrays should be avoided because of interpreting
271+
overhead inherent to Python **for**-loops. There is also overhead from
272+
accessing individual elements of Numpy arrays.
255273

256274
With Cython we can bypass both restrictions and write efficient loops over
257275
Numpy arrays. Consider e.g. a double loop that sets values of a 2D array:
@@ -280,7 +298,7 @@ nice, but we are still bottlenecked by array lookups and assignments, i.e. the
280298
We can get a huge speedup by adding a static type declaration for the Numpy
281299
array, and for the other variables too while we are at it. To do this we must
282300
import compile-time information about the Numpy module using the
283-
Cython-specific `cimport` keyword, then use Cython's Numpy interface to
301+
Cython-specific **cimport** keyword, then use Cython's Numpy interface to
284302
declare the array's datatype and dimensions:
285303

286304
.. code:: python
@@ -291,7 +309,7 @@ declare the array's datatype and dimensions:
291309
def fast_looper(int N):
292310
""""""
293311
294-
# Static declaration: 2D array of integers
312+
# Type declaration: 2D array of 32-bit integers
295313
cdef cnp.ndarray[cnp.int32_t, ndim=2] data
296314
data = np.empty((N, N), dtype=np.int32)
297315
@@ -313,7 +331,7 @@ pure Python implementation!
313331
in Python distributions. This usually works out of the box for Jupyter
314332
notebooks. However, if using the command line `cythonize` tool you may need
315333
to manually set include paths for the C compiler knows where to find the
316-
headers. Refer to `the docs <https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#compilation>__`
334+
headers. Refer to `the docs <https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#compilation>`__
317335
for more details.
318336

319337
.. callout::
@@ -378,8 +396,7 @@ Further reading
378396
---------------
379397

380398
- Newer usage of Numpy arrays (memory views) https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html#numpy-tutorial
381-
- cpdef keyword for functions
382-
399+
- TODO
383400

384401

385402
Summary

0 commit comments

Comments
 (0)