Skip to content

Commit 7330177

Browse files
Merge pull request #228 from AaltoSciComp/thor/updates
updates to pandas and parallel
2 parents c259bb8 + 9861d3d commit 7330177

File tree

2 files changed

+45
-2
lines changed

2 files changed

+45
-2
lines changed

content/pandas.rst

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -333,9 +333,18 @@ Time series superpowers
333333

334334
An introduction of pandas wouldn't be complete without mention of its
335335
special abilities to handle time series. To show just a few examples,
336-
we will use a new dataset of Nobel prize laureates::
336+
we will use a new dataset of Nobel prize laureates available through
337+
an API of the Nobel prize organisation at
338+
http://api.nobelprize.org/v1/laureate.csv .
337339

338-
nobel = pd.read_csv("http://api.nobelprize.org/v1/laureate.csv")
340+
Unfortunately this API does not allow "non-browser requests", so
341+
:meth:`pd.read_csv` will not work. We can either open the above link in
342+
a browser and download the file, or use the JupyterLab interface by clicking
343+
"File" and "Open from URL", and then save the CSV file to disk.
344+
345+
We can then load and explore the data::
346+
347+
nobel = pd.read_csv("laureate.csv")
339348
nobel.head()
340349

341350
This dataset has three columns for time, "born"/"died" and "year".
@@ -428,6 +437,37 @@ Exercises 3
428437
sns.catplot(x="bornCountry", col="category", data=subset_physchem, kind="count");
429438

430439

440+
.. solution::
441+
442+
We use the :meth:`describe` method:
443+
444+
::
445+
446+
nobel.bornCountryCode.describe()
447+
# count 956
448+
# unique 81
449+
# top US
450+
# freq 287
451+
452+
We see that the US has received the largest number of Nobel prizes,
453+
and 81 countries are represented.
454+
455+
To calculate the age at which laureates receive their prize, we need
456+
to ensure that the "year" and "born" columns are in datetime format::
457+
458+
nobel["born"] = pd.to_datetime(nobel["born"], errors ='coerce')
459+
nobel["year"] = pd.to_datetime(nobel["year"], format="%Y")
460+
461+
Then we add a column with the age at which Nobel prize was received
462+
and plot a histogram::
463+
464+
nobel["age_nobel"] = round((nobel["year"] - nobel["born"]).dt.days / 365, 1)
465+
nobel.hist(column="age_nobel", bins=25, figsize=(8,10), rwidth=0.9)
466+
467+
We can print names of all laureates from a given country, e.g.::
468+
469+
nobel[nobel["country"] == "Sweden"].loc[:, "firstname":"surname"]
470+
431471
Beyond the basics
432472
-----------------
433473

@@ -439,6 +479,7 @@ Larger DataFrame operations might be faster using :obj:`~pandas.eval()` with str
439479
rng = np.random.RandomState(42)
440480
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols))
441481
for i in range(4))
482+
442483
Adding dataframes the pythonic way yields::
443484

444485
%timeit df1 + df2 + df3 + df4

content/parallel.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -513,3 +513,5 @@ See also
513513
you the full control you need.
514514
- Combining vectorized functions (NumPy, Scipy, pandas, etc.) with
515515
the parallel strategies listed here will get you very far.
516+
- Another popular framework similar to `multiprocessing` is
517+
`joblib <https://joblib.readthedocs.io/en/latest/>`__.

0 commit comments

Comments
 (0)