@@ -333,9 +333,18 @@ Time series superpowers
333333
334334An introduction of pandas wouldn't be complete without mention of its
335335special abilities to handle time series. To show just a few examples,
336- we will use a new dataset of Nobel prize laureates::
336+ we will use a new dataset of Nobel prize laureates available through
337+ an API of the Nobel prize organisation at
338+ http://api.nobelprize.org/v1/laureate.csv .
337339
338- nobel = pd.read_csv("http://api.nobelprize.org/v1/laureate.csv")
340+ Unfortunately this API does not allow "non-browser requests", so
341+ :meth: `pd.read_csv ` will not work. We can either open the above link in
342+ a browser and download the file, or use the JupyterLab interface by clicking
343+ "File" and "Open from URL", and then save the CSV file to disk.
344+
345+ We can then load and explore the data::
346+
347+ nobel = pd.read_csv("laureate.csv")
339348 nobel.head()
340349
341350This dataset has three columns for time, "born"/"died" and "year".
@@ -428,6 +437,37 @@ Exercises 3
428437 sns.catplot(x="bornCountry", col="category", data=subset_physchem, kind="count");
429438
430439
440+ .. solution ::
441+
442+ We use the :meth: `describe ` method:
443+
444+ ::
445+
446+ nobel.bornCountryCode.describe()
447+ # count 956
448+ # unique 81
449+ # top US
450+ # freq 287
451+
452+ We see that the US has received the largest number of Nobel prizes,
453+ and 81 countries are represented.
454+
455+ To calculate the age at which laureates receive their prize, we need
456+ to ensure that the "year" and "born" columns are in datetime format::
457+
458+ nobel["born"] = pd.to_datetime(nobel["born"], errors ='coerce')
459+ nobel["year"] = pd.to_datetime(nobel["year"], format="%Y")
460+
461+ Then we add a column with the age at which Nobel prize was received
462+ and plot a histogram::
463+
464+ nobel["age_nobel"] = round((nobel["year"] - nobel["born"]).dt.days / 365, 1)
465+ nobel.hist(column="age_nobel", bins=25, figsize=(8,10), rwidth=0.9)
466+
467+ We can print names of all laureates from a given country, e.g.::
468+
469+ nobel[nobel["country"] == "Sweden"].loc[:, "firstname":"surname"]
470+
431471Beyond the basics
432472-----------------
433473
@@ -439,6 +479,7 @@ Larger DataFrame operations might be faster using :obj:`~pandas.eval()` with str
439479 rng = np.random.RandomState(42)
440480 df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols))
441481 for i in range(4))
482+
442483Adding dataframes the pythonic way yields::
443484
444485 %timeit df1 + df2 + df3 + df4
0 commit comments