Skip to content

Commit 5b5260e

Browse files
committed
Pandas: ensure numexpr
1 parent 20d6f55 commit 5b5260e

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

content/pandas.rst

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -530,10 +530,20 @@ Exercises 3
530530
Beyond the basics
531531
-----------------
532532

533-
Larger DataFrame operations might be faster using :func:`~pandas.eval` with string expressions, `see
534-
<https://jakevdp.github.io/PythonDataScienceHandbook/03.12-performance-eval-and-query.html>`__::
533+
Faster expression evaluation with :func:`~pandas.eval`
534+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
535+
536+
Larger DataFrame operations might be faster using :func:`~pandas.eval` with string expressions (`see
537+
here <https://pandas.pydata.org/docs/user_guide/enhancingperf.html#eval-performance-comparison>`__).
538+
To do so, we start by installing ``numexpr`` a Python library which optimizes such expressions::
539+
540+
%conda install numexpr
541+
542+
You may need to restart the kernel in Jupyter for this to be. Then::
535543

536544
import pandas as pd
545+
import numpy as np
546+
537547
# Make some really big dataframes
538548
nrows, ncols = 100000, 100
539549
rng = np.random.RandomState(42)
@@ -547,9 +557,11 @@ Adding dataframes the pythonic way yields::
547557

548558
And by using :func:`~pandas.eval`::
549559

550-
%timeit pd.eval('df1 + df2 + df3 + df4')
560+
%timeit pd.eval('df1 + df2 + df3 + df4', engine='numexpr')
551561
# 40ms
552562

563+
Assigning columns with :meth:`~pandas.DataFrame.apply`
564+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
553565

554566
We can assign function return lists as dataframe columns::
555567

0 commit comments

Comments
 (0)