Skip to content

Commit 4a67b26

Browse files
bugfixing index
1 parent f3e4dfc commit 4a67b26

File tree

8 files changed

+54
-54
lines changed

8 files changed

+54
-54
lines changed

source/classification1.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ total set of variables per image in this data set is:
183183

184184
+++
185185

186-
```{index} pandas.DataFrame; info
186+
```{index} DataFrame; info
187187
```
188188

189189
Below we use the `info` method to preview the data frame. This method can
@@ -195,7 +195,7 @@ as well as their data types and the number of non-missing entries.
195195
cancer.info()
196196
```
197197

198-
```{index} pandas.Series; unique
198+
```{index} Series; unique
199199
```
200200

201201
From the summary of the data above, we can see that `Class` is of type `object`.
@@ -213,7 +213,7 @@ method. The `replace` method takes one argument: a dictionary that maps
213213
previous values to desired new values.
214214
We will verify the result using the `unique` method.
215215

216-
```{index} pandas.Series; replace
216+
```{index} Series; replace
217217
```
218218

219219
```{code-cell} ipython3
@@ -227,7 +227,7 @@ cancer["Class"].unique()
227227

228228
### Exploring the cancer data
229229

230-
```{index} pandas.DataFrame; groupby, pandas.Series;size
230+
```{index} DataFrame; groupby, Series;size
231231
```
232232

233233
```{code-cell} ipython3
@@ -256,7 +256,7 @@ tumor observations.
256256
100 * cancer.groupby("Class").size() / cancer.shape[0]
257257
```
258258

259-
```{index} pandas.Series; value_counts
259+
```{index} Series; value_counts
260260
```
261261

262262
The `pandas` package also has a more convenient specialized `value_counts` method for
@@ -1607,7 +1607,7 @@ Imbalanced data with background color indicating the decision of the classifier
16071607

16081608
+++
16091609

1610-
```{index} oversampling, pandas.DataFrame; sample
1610+
```{index} oversampling, DataFrame; sample
16111611
```
16121612

16131613
Despite the simplicity of the problem, solving it in a statistically sound manner is actually

source/clustering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,7 @@ have.
308308
clus = penguins_clustered[penguins_clustered["cluster"] == 0][["bill_length_standardized", "flipper_length_standardized"]]
309309
```
310310

311-
```{index} see: within-cluster sum-of-squared-distances; WSSD
311+
```{index} see: within-cluster sum of squared distances; WSSD
312312
```
313313

314314
```{index} WSSD

source/inference.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ We can find the proportion of listings for each room type
168168
by using the `value_counts` function with the `normalize` parameter
169169
as we did in previous chapters.
170170

171-
```{index} pandas.DataFrame; [], pandas.DataFrame; value_counts
171+
```{index} DataFrame; [], DataFrame; value_counts
172172
```
173173

174174
```{code-cell} ipython3
@@ -187,13 +187,13 @@ value, {glue:text}`population_proportion`, is the population parameter. Remember
187187
parameter value is usually unknown in real data analysis problems, as it is
188188
typically not possible to make measurements for an entire population.
189189

190-
```{index} pandas.DataFrame; sample, seed;numpy.random.seed
190+
```{index} DataFrame; sample, seed;numpy.random.seed
191191
```
192192

193193
Instead, perhaps we can approximate it with a small subset of data!
194194
To investigate this idea, let's try randomly selecting 40 listings (*i.e.,* taking a random sample of
195195
size 40 from our population), and computing the proportion for that sample.
196-
We will use the `sample` method of the `pandas.DataFrame`
196+
We will use the `sample` method of the `DataFrame`
197197
object to take the sample. The argument `n` of `sample` is the size of the sample to take
198198
and since we are starting to use randomness here,
199199
we are also setting the random seed via numpy to make the results reproducible.
@@ -213,7 +213,7 @@ airbnb.sample(n=40)["room_type"].value_counts(normalize=True)
213213
glue("sample_1_proportion", "{:.3f}".format(airbnb.sample(n=40, random_state=155)["room_type"].value_counts(normalize=True)["Entire home/apt"]))
214214
```
215215

216-
```{index} pandas.DataFrame; value_counts
216+
```{index} DataFrame; value_counts
217217
```
218218

219219
Here we see that the proportion of entire home/apartment listings in this
@@ -248,7 +248,7 @@ commonly refer to as $n$) from a population is called
248248
a **sampling distribution**. The sampling distribution will help us see how much we would
249249
expect our sample proportions from this population to vary for samples of size 40.
250250

251-
```{index} pandas.DataFrame; sample
251+
```{index} DataFrame; sample
252252
```
253253

254254
We again use the `sample` to take samples of size 40 from our
@@ -284,7 +284,7 @@ to compute the number of qualified observations in each sample; finally compute
284284
Both the first and last few entries of the resulting data frame are printed
285285
below to show that we end up with 20,000 point estimates, one for each of the 20,000 samples.
286286

287-
```{index} pandas.DataFrame;groupby, pandas.DataFrame;reset_index
287+
```{index} DataFrame;groupby, DataFrame;reset_index
288288
```
289289

290290
```{code-cell} ipython3
@@ -479,7 +479,7 @@ The price per night of all Airbnb rentals in Vancouver, BC
479479
is \${glue:text}`population_mean`, on average. This value is our
480480
population parameter since we are calculating it using the population data.
481481

482-
```{index} pandas.DataFrame; sample
482+
```{index} DataFrame; sample
483483
```
484484

485485
Now suppose we did not have access to the population data (which is usually the
@@ -987,7 +987,7 @@ mean of the sample is \${glue:text}`estimate_mean`.
987987
Remember, in practice, we usually only have this one sample from the population. So
988988
this sample and estimate are the only data we can work with.
989989

990-
```{index} bootstrap; in Python, pandas.DataFrame; sample (bootstrap)
990+
```{index} bootstrap; in Python, DataFrame; sample (bootstrap)
991991
```
992992

993993
We now perform steps 1–5 listed above to generate a single bootstrap
@@ -1106,7 +1106,7 @@ generate a bootstrap distribution of these point estimates. The bootstrap
11061106
distribution ({numref}`fig:11-bootstrapping5`) suggests how we might expect
11071107
our point estimate to behave if we take multiple samples.
11081108

1109-
```{index} pandas.DataFrame;reset_index, pandas.DataFrame;rename, pandas.DataFrame;groupby, pandas.Series;mean
1109+
```{index} DataFrame;reset_index, DataFrame;rename, DataFrame;groupby, Series;mean
11101110
```
11111111

11121112
```{code-cell} ipython3
@@ -1252,7 +1252,7 @@ Quantiles are expressed in proportions rather than percentages,
12521252
so the 2.5th and 97.5th percentiles
12531253
would be the 0.025 and 0.975 quantiles, respectively.
12541254

1255-
```{index} pandas.DataFrame; [], pandas.DataFrame;quantile
1255+
```{index} DataFrame; [], DataFrame;quantile
12561256
```
12571257

12581258
```{index} percentile

source/intro.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -437,13 +437,13 @@ can_lang
437437

438438
## Creating subsets of data frames with `[]` & `loc[]`
439439

440-
```{index} see: []; pandas.DataFrame
440+
```{index} see: []; DataFrame
441441
```
442442

443-
```{index} see: loc[]; pandas.DataFrame
443+
```{index} see: loc[]; DataFrame
444444
```
445445

446-
```{index} pandas.DataFrame; [], pandas.DataFrame; loc[], selecting columns
446+
```{index} DataFrame; [], DataFrame; loc[], selecting columns
447447
```
448448

449449
Now that we've loaded our data into Python, we can start wrangling the data to
@@ -475,7 +475,7 @@ high-level categories of languages, which include "Aboriginal languages",
475475
our question we want to filter our data set so we restrict our attention
476476
to only those languages in the "Aboriginal languages" category.
477477

478-
```{index} pandas.DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
478+
```{index} DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
479479
```
480480

481481
We can use the `[]` operation to obtain the subset of rows with desired values
@@ -521,7 +521,7 @@ can_lang[can_lang["category"] == "Aboriginal languages"]
521521
### Using `[]` to select columns
522522

523523

524-
```{index} pandas.DataFrame; [], selecting columns
524+
```{index} DataFrame; [], selecting columns
525525
```
526526

527527
We can also use the `[]` operation to select columns from a data frame.
@@ -551,7 +551,7 @@ can_lang[["language", "mother_tongue"]]
551551

552552
### Using `loc[]` to filter rows and select columns
553553

554-
```{index} pandas.DataFrame; loc[], selecting columns
554+
```{index} DataFrame; loc[], selecting columns
555555
```
556556

557557
The `[]` operation is only used when you want to filter rows *or* select columns;
@@ -612,7 +612,7 @@ So it looks like the `loc[]` operation gave us the result we wanted!
612612

613613
## Using `sort_values` and `head` to select rows by ordered values
614614

615-
```{index} pandas.DataFrame; sort_values, pandas.DataFrame; head
615+
```{index} DataFrame; sort_values, DataFrame; head
616616
```
617617

618618
We have used the `[]` and `loc[]` operations on a data frame to obtain a table

source/reading.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -407,7 +407,7 @@ canlang_data = pd.read_csv(
407407
canlang_data
408408
```
409409

410-
```{index} pandas.DataFrame; rename, pandas
410+
```{index} DataFrame; rename, pandas
411411
```
412412

413413
It is best to rename your columns manually in this scenario. The current column names
@@ -790,7 +790,7 @@ that we need for analysis; we do eventually need to call `execute`.
790790
For example, `ibis` does not provide the `tail` function to look at the last
791791
rows in a database, even though `pandas` does.
792792

793-
```{index} pandas.DataFrame; tail
793+
```{index} DataFrame; tail
794794
```
795795

796796
```{code-cell} ipython3
@@ -951,7 +951,7 @@ Databases are beneficial in a large-scale setting:
951951

952952
## Writing data from Python to a `.csv` file
953953

954-
```{index} write function; to_csv, pandas.DataFrame; to_csv
954+
```{index} write function; to_csv, DataFrame; to_csv
955955
```
956956

957957
At the middle and end of a data analysis, we often want to write a data frame

source/regression1.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ how well it predicts house sale price. This subsample is taken to allow us to
233233
illustrate the mechanics of K-NN regression with a few data points; later in
234234
this chapter we will use all the data.
235235

236-
```{index} pandas.DataFrame; sample
236+
```{index} DataFrame; sample
237237
```
238238

239239
To take a small random sample of size 30, we'll use the
@@ -287,7 +287,7 @@ Scatter plot of price (USD) versus house size (square feet) with vertical line i
287287

288288
+++
289289

290-
```{index} pandas.DataFrame; abs, pandas.DataFrame; nsmallest
290+
```{index} DataFrame; abs, DataFrame; nsmallest
291291
```
292292

293293
We will employ the same intuition from {numref}`Chapters %s <classification1>` and {numref}`%s <classification2>`, and use the

source/viz.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -718,7 +718,7 @@ in the magnitude of these two numbers!
718718
We can confirm that the two points in the upper right-hand corner correspond
719719
to Canada's two official languages by filtering the data:
720720

721-
```{index} pandas.DataFrame; loc[]
721+
```{index} DataFrame; loc[]
722722
```
723723

724724
```{code-cell} ipython3
@@ -848,7 +848,7 @@ using `_` so that it is easier to read;
848848
this does not affect how Python interprets the number
849849
and is just added for readability.
850850

851-
```{index} pandas.DataFrame; column assignment, pandas.DataFrame; []
851+
```{index} DataFrame; column assignment, DataFrame; []
852852
```
853853

854854
```{code-cell} ipython3
@@ -1228,7 +1228,7 @@ as `sort_values` followed by `head`, but are slightly more efficient because the
12281228
In general, it is good to use more specialized functions when they are available!
12291229
```
12301230

1231-
```{index} pandas.DataFrame; nlargest, pandas.DataFrame; nsmallest
1231+
```{index} DataFrame; nlargest, DataFrame; nsmallest
12321232
```
12331233

12341234
```{code-cell} ipython3

0 commit comments

Comments
 (0)