Skip to content

Commit 12d62c2

Browse files
committed
content/pandas: more intersphinx
1 parent f98f8b2 commit 12d62c2

File tree

1 file changed

+18
-13
lines changed

1 file changed

+18
-13
lines changed

content/pandas.rst

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@ print some summary statistics of its numerical data::
5454
Ok, so we have information on passenger names, survival (0 or 1), age,
5555
ticket fare, number of siblings/spouses, etc. With the summary statistics we see that the average age is 29.7 years, maximum ticket price is 512 USD, 38\% of passengers survived, etc.
5656

57-
Let's say we're interested in the survival probability of different age groups. With two one-liners, we can find the average age of those who survived or didn't survive, and plot corresponding histograms of the age distribution::
57+
Let's say we're interested in the survival probability of different
58+
age groups. With two one-liners, we can find the average age of those
59+
who survived or didn't survive, and plot corresponding histograms of
60+
the age distribution (:meth:`pandas.DataFrame.groupby`, :meth:`pandas.DataFrame.hist`)::
5861

5962
print(titanic.groupby("Survived")["Age"].mean())
6063

@@ -89,7 +92,7 @@ What's in a dataframe?
8992

9093
As we saw above, pandas dataframes are a powerful tool for working with tabular data.
9194
A pandas
92-
`DataFrame object <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame>`__
95+
:class:`pandas.DataFrame`
9396
is composed of rows and columns:
9497

9598
.. image:: img/pandas/01_table_dataframe.svg
@@ -111,7 +114,7 @@ and reading the titanic.csv datafile into a dataframe if needed, see above)::
111114
titanic.Age # same as above
112115
type(titanic["Age"])
113116

114-
The columns have names. Here's how to get them::
117+
The columns have names. Here's how to get them (:attr:`~pandas.DataFrame.columns`)::
115118

116119
titanic.columns
117120

@@ -121,7 +124,9 @@ However, the rows also have names! This is what Pandas calls the :obj:`~pandas.D
121124

122125
We saw above how to select a single column, but there are many ways of
123126
selecting (and setting) single or multiple rows, columns and values. We can
124-
refer to columns and rows either by number or by their name::
127+
refer to columns and rows either by number or by their name
128+
(:attr:`~pandas.DataFrame.loc`, :attr:`~pandas.DataFrame.iloc`,
129+
:attr:`~pandas.DataFrame.at`, :attr:`~pandas.DataFrame.iat`)::
125130

126131
titanic.loc['Lam, Mr. Ali',"Age"] # select single value by row and column
127132
titanic.loc[:'Lam, Mr. Ali',"Name":"Age"] # slice the dataframe by row and column *names*
@@ -228,7 +233,7 @@ For a detailed exposition of data tidying, have a look at
228233
Working with dataframes
229234
-----------------------
230235

231-
We saw above how we can read in data into a dataframe using the :obj:`~pandas.read_csv` method.
236+
We saw above how we can read in data into a dataframe using the :func:`~pandas.read_csv` function.
232237
Pandas also understands multiple other formats, for example using :obj:`~pandas.read_excel`,
233238
:obj:`~pandas.read_hdf`, :obj:`~pandas.read_json`, etc. (and corresponding methods to write to file:
234239
:obj:`~pandas.DataFrame.to_csv`, :obj:`~pandas.DataFrame.to_excel`, :obj:`~pandas.DataFrame.to_hdf`, :obj:`~pandas.DataFrame.to_json`, etc.)
@@ -337,7 +342,7 @@ an API of the Nobel prize organisation at
337342
http://api.nobelprize.org/v1/laureate.csv .
338343

339344
Unfortunately this API does not allow "non-browser requests", so
340-
:meth:`pd.read_csv` will not work. We can either open the above link in
345+
:obj:`pandas.read_csv` will not work. We can either open the above link in
341346
a browser and download the file, or use the JupyterLab interface by clicking
342347
"File" and "Open from URL", and then save the CSV file to disk.
343348

@@ -366,11 +371,11 @@ to one decimal::
366371

367372
nobel["lifespan"] = round((nobel["died"] - nobel["born"]).dt.days / 365, 1)
368373

369-
and then plot a histogram of lifespans::
374+
and then plot a :meth:`histogram <pandas.DataFrame.hist>` of lifespans::
370375

371376
nobel.hist(column='lifespan', bins=25, figsize=(8,10), rwidth=0.9)
372377

373-
Finally, let's see one more example of an informative plot
378+
Finally, let's see one more example of an informative plot (:meth:`~pandas.DataFrame.boxplot`)
374379
produced by a single line of code::
375380

376381
nobel.boxplot(column="lifespan", by="category")
@@ -397,8 +402,8 @@ Exercises 3
397402
countries = np.array([COUNTRY1, COUNTRY2, COUNTRY3, COUNTRY4])
398403
subset = nobel.loc[nobel['bornCountry'].isin(countries)]
399404

400-
- Use ``groupby`` to compute how many nobel prizes each country received in
401-
each category. The ``size()`` method tells us how many rows, hence nobel
405+
- Use :meth:`~pandas.DataFrame.groupby` to compute how many nobel prizes each country received in
406+
each category. The :meth:`~pandas.core.groupby.GroupBy.size` method tells us how many rows, hence nobel
402407
prizes, are in each group::
403408

404409
nobel.groupby(['bornCountry', 'category']).size()
@@ -408,7 +413,7 @@ Exercises 3
408413
- First add a column “number” to the nobel dataframe containing 1’s
409414
(to enable the counting below).
410415

411-
- Then create the pivot table::
416+
- Then create the :meth:`~pandas.DataFrame.pivot_table`::
412417

413418
table = subset.pivot_table(values="number", index="bornCountry", columns="category", aggfunc=np.sum)
414419

@@ -470,7 +475,7 @@ Exercises 3
470475
Beyond the basics
471476
-----------------
472477

473-
Larger DataFrame operations might be faster using :obj:`~pandas.eval()` with string expressions, `see
478+
Larger DataFrame operations might be faster using :func:`~pandas.eval` with string expressions, `see
474479
<https://jakevdp.github.io/PythonDataScienceHandbook/03.12-performance-eval-and-query.html>`__::
475480

476481
import pandas as pd
@@ -484,7 +489,7 @@ Adding dataframes the pythonic way yields::
484489
%timeit df1 + df2 + df3 + df4
485490
# 80ms
486491
487-
And by using :obj:`~pandas.eval()`::
492+
And by using :func:`~pandas.eval`::
488493

489494
%timeit pd.eval('df1 + df2 + df3 + df4')
490495
# 40ms

0 commit comments

Comments
 (0)