Skip to content

Commit e9bbc01

Browse files
bugfixing index
1 parent 4a67b26 commit e9bbc01

File tree

6 files changed

+26
-17
lines changed

6 files changed

+26
-17
lines changed

source/classification1.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1059,10 +1059,13 @@ predictors (colored by diagnosis) for both the unstandardized data we just
10591059
loaded, and the standardized version of that same data. But first, we need to
10601060
standardize the `unscaled_cancer` data set with `scikit-learn`.
10611061

1062-
```{index} Pipeline, scikit-learn; make_column_transformer
1062+
```{index} see: Pipeline; scikit-learn
10631063
```
10641064

1065-
```{index} double: scikit-learn; Pipeline
1065+
```{index} see: make_column_transformer; scikit-learn
1066+
```
1067+
1068+
```{index} scikit-learn;Pipeline, scikit-learn; make_column_transformer
10661069
```
10671070

10681071
The `scikit-learn` framework provides a collection of *preprocessors* used to manipulate
@@ -1091,10 +1094,10 @@ preprocessor
10911094
```{index} scikit-learn; make_column_transformer, scikit-learn; StandardScaler
10921095
```
10931096

1094-
```{index} StandardScaler
1097+
```{index} see: StandardScaler; scikit-learn
10951098
```
10961099

1097-
```{index} scikit-learn; fit, scikit-learn; make_column_selector
1100+
```{index} scikit-learn; fit, scikit-learn; make_column_selector, scikit-learn; StandardScaler
10981101
```
10991102

11001103
You can see that the preprocessor includes a single standardization step

source/classification2.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,7 @@ glue("cancer_train_nrow", "{:d}".format(len(cancer_train)))
531531
glue("cancer_test_nrow", "{:d}".format(len(cancer_test)))
532532
```
533533

534-
```{index} pandas.DataFrame; info
534+
```{index} DataFrame; info
535535
```
536536

537537
We can see from the `info` method above that the training set contains {glue:text}`cancer_train_nrow` observations,
@@ -540,7 +540,7 @@ a train / test split of 75% / 25%, as desired. Recall from {numref}`Chapter %s <
540540
that we use the `info` method to preview the number of rows, the variable names, their data types, and
541541
missing entries of a data frame.
542542

543-
```{index} pandas.Series; value_counts
543+
```{index} Series; value_counts
544544
```
545545

546546
We can use the `value_counts` method with the `normalize` argument set to `True`
@@ -572,7 +572,7 @@ training and test data sets.
572572

573573
+++
574574

575-
```{index} Pipeline, make_column_transformer, StandardScaler
575+
```{index} scikit-learn; Pipeline, scikit-learn; make_column_transformer, scikit-learn; StandardScaler
576576
```
577577

578578
Fortunately, `scikit-learn` helps us handle this properly as long as we wrap our
@@ -1047,7 +1047,7 @@ cv_5_df
10471047
```{index} see: sem;standard error
10481048
```
10491049

1050-
```{index} standard error, pandas.DataFrame;agg
1050+
```{index} standard error, DataFrame;agg
10511051
```
10521052

10531053
The validation scores we are interested in are contained in the `test_score` column.
@@ -1564,7 +1564,7 @@ us automatically. To make predictions and assess the estimated accuracy of the b
15641564
`score` and `predict` methods of the fit `GridSearchCV` object. We can then pass those predictions to
15651565
the `precision`, `recall`, and `crosstab` functions to assess the estimated precision and recall, and print a confusion matrix.
15661566

1567-
```{index} predict, score, precision_score, recall_score, crosstab
1567+
```{index} scikit-learn;predict, scikit-learn;score, scikit-learn;precision_score, scikit-learn;recall_score, crosstab
15681568
```
15691569

15701570
```{code-cell} ipython3
@@ -1670,7 +1670,7 @@ Overview of K-NN classification.
16701670

16711671
+++
16721672

1673-
```{index} scikit-learn, pipeline, cross-validation, K-nearest neighbors; classification, classification
1673+
```{index} scikit-learn, Pipeline, cross-validation, K-nearest neighbors; classification, classification
16741674
```
16751675

16761676
The overall workflow for performing K-nearest neighbors classification using `scikit-learn` is as follows:

source/clustering.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -795,6 +795,9 @@ Total WSSD for K clusters ranging from 1 to 9.
795795
```{index} K-means, scikit-learn; KMeans
796796
```
797797

798+
```{index} see: KMeans; scikit-learn
799+
```
800+
798801
We can perform K-means in Python using a workflow similar to those
799802
in the earlier classification and regression chapters.
800803
Returning to the original (unstandardized) `penguins` data,
@@ -807,7 +810,7 @@ To address this problem, we typically standardize our data before clustering,
807810
which ensures that each variable has a mean of 0 and standard deviation of 1.
808811
The `StandardScaler` function in `scikit-learn` can be used to do this.
809812

810-
```{index} scikit-learn; StandardScaler, standardization;K-means, K-means;standardization
813+
```{index} scikit-learn; StandardScaler, scikit-learn;KMeans, standardization;K-means, K-means;standardization
811814
```
812815

813816
```{code-cell} ipython3
@@ -829,14 +832,17 @@ To indicate that we are performing K-means clustering, we will create a `KMeans`
829832
model object. It takes at
830833
least one argument: the number of clusters `n_clusters`, which we set to 3.
831834

835+
```{index} KMeans;n_clusters
836+
```
837+
832838
```{code-cell} ipython3
833839
from sklearn.cluster import KMeans
834840
835841
kmeans = KMeans(n_clusters=3)
836842
kmeans
837843
```
838844

839-
```{index} scikit-learn;Pipeline, scikit-learn;fit
845+
```{index} scikit-learn;make_pipeline, scikit-learn;Pipeline, scikit-learn;fit
840846
```
841847

842848
To actually run the K-means clustering, we combine the preprocessor and model object
@@ -852,7 +858,7 @@ penguin_clust.fit(penguins)
852858
penguin_clust
853859
```
854860

855-
```{index} KMeans; labels_, KMeans; inertia_, KMeans; cluster_centers_, , KMeans; predict
861+
```{index} KMeans; labels_, KMeans; inertia_
856862
```
857863

858864
The fit `KMeans` object&mdash;which is the second item in the
@@ -907,7 +913,7 @@ The data colored by the cluster assignments returned by K-means.
907913
```{index} WSSD; total, KMeans; inertia_
908914
```
909915

910-
```{index} see: WSSD; K-means inertia_
916+
```{index} see: WSSD; KMeans
911917
```
912918

913919
As mentioned above,

source/reading.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1331,7 +1331,7 @@ argument&mdash;the URL of the page to scrape&mdash;and will return a list of
13311331
data frames corresponding to all the tables it finds at that URL. We can see
13321332
below that `read_html` found 17 tables on the Wikipedia page for Canada.
13331333

1334-
```{index} read function; read_html, read_html
1334+
```{index} read function; read_html
13351335
```
13361336

13371337
```python

source/regression1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -391,7 +391,7 @@ about what the data must look like for it to work.
391391

392392
## Training, evaluating, and tuning the model
393393

394-
```{index} training data, test data
394+
```{index} training set, test set
395395
```
396396

397397
As usual, we must start by putting some test data away in a lock box

source/viz.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1963,7 +1963,7 @@ bad, while raster images eventually start to look "pixelated."
19631963
```{index} PDF
19641964
```
19651965

1966-
```{index} see: portable document dormat; PDF
1966+
```{index} see: portable document format; PDF
19671967
```
19681968

19691969
```{note}

0 commit comments

Comments
 (0)