bugfixing index

trevorcampbell · trevorcampbell · commit e9bbc013d582 · 2023-11-16T15:11:24.000-08:00
diff --git a/source/classification1.md b/source/classification1.md
@@ -1059,10 +1059,13 @@ predictors (colored by diagnosis) for both the unstandardized data we just
 loaded, and the standardized version of that same data. But first, we need to
 standardize the `unscaled_cancer` data set with `scikit-learn`.
 
-```{index} Pipeline, scikit-learn; make_column_transformer
+```{index} see: Pipeline; scikit-learn
 ```
 
-```{index} double: scikit-learn; Pipeline
+```{index} see: make_column_transformer; scikit-learn
+```
+
+```{index} scikit-learn;Pipeline, scikit-learn; make_column_transformer
 ```
 
 The `scikit-learn` framework provides a collection of *preprocessors* used to manipulate
@@ -1091,10 +1094,10 @@ preprocessor
 ```{index} scikit-learn; make_column_transformer, scikit-learn; StandardScaler 
 ```
 
-```{index} StandardScaler
+```{index} see: StandardScaler; scikit-learn
 ```
 
-```{index} scikit-learn; fit, scikit-learn; make_column_selector
+```{index} scikit-learn; fit, scikit-learn; make_column_selector, scikit-learn; StandardScaler
 ```
 
 You can see that the preprocessor includes a single standardization step
diff --git a/source/classification2.md b/source/classification2.md
@@ -531,7 +531,7 @@ glue("cancer_train_nrow", "{:d}".format(len(cancer_train)))
 glue("cancer_test_nrow", "{:d}".format(len(cancer_test)))
 ```
 
-```{index} pandas.DataFrame; info
+```{index} DataFrame; info
 ```
 
 We can see from the `info` method above that the training set contains {glue:text}`cancer_train_nrow` observations,
@@ -540,7 +540,7 @@ a train / test split of 75% / 25%, as desired. Recall from {numref}`Chapter %s <
 that we use the `info` method to preview the number of rows, the variable names, their data types, and
 missing entries of a data frame.
 
-```{index} pandas.Series; value_counts
+```{index} Series; value_counts
 ```
 
 We can use the `value_counts` method with the `normalize` argument set to `True`
@@ -572,7 +572,7 @@ training and test data sets.
 
 +++
 
-```{index} Pipeline, make_column_transformer, StandardScaler
+```{index} scikit-learn; Pipeline, scikit-learn; make_column_transformer, scikit-learn; StandardScaler
 ```
 
 Fortunately, `scikit-learn` helps us handle this properly as long as we wrap our
@@ -1047,7 +1047,7 @@ cv_5_df
 ```{index} see: sem;standard error
 ```
 
-```{index} standard error, pandas.DataFrame;agg
+```{index} standard error, DataFrame;agg
 ```
 
 The validation scores we are interested in are contained in the `test_score` column.
@@ -1564,7 +1564,7 @@ us automatically. To make predictions and assess the estimated accuracy of the b
 `score` and `predict` methods of the fit `GridSearchCV` object. We can then pass those predictions to
 the `precision`, `recall`, and `crosstab` functions to assess the estimated precision and recall, and print a confusion matrix.
 
-```{index} predict, score, precision_score, recall_score, crosstab
+```{index} scikit-learn;predict, scikit-learn;score, scikit-learn;precision_score, scikit-learn;recall_score, crosstab
 ```
 
 ```{code-cell} ipython3
@@ -1670,7 +1670,7 @@ Overview of K-NN classification.
 
 +++
 
-```{index} scikit-learn, pipeline, cross-validation, K-nearest neighbors; classification, classification
+```{index} scikit-learn, Pipeline, cross-validation, K-nearest neighbors; classification, classification
 ```
 
 The overall workflow for performing K-nearest neighbors classification using `scikit-learn` is as follows:
diff --git a/source/clustering.md b/source/clustering.md
@@ -795,6 +795,9 @@ Total WSSD for K clusters ranging from 1 to 9.
 ```{index} K-means, scikit-learn; KMeans
 ```
 
+```{index} see: KMeans; scikit-learn
+```
+
 We can perform K-means in Python using a workflow similar to those
 in the earlier classification and regression chapters.
 Returning to the original (unstandardized) `penguins` data,
@@ -807,7 +810,7 @@ To address this problem, we typically standardize our data before clustering,
 which ensures that each variable has a mean of 0 and standard deviation of 1.
 The `StandardScaler` function in `scikit-learn` can be used to do this.
 
-```{index} scikit-learn; StandardScaler, standardization;K-means, K-means;standardization
+```{index} scikit-learn; StandardScaler, scikit-learn;KMeans, standardization;K-means, K-means;standardization
 ```
 
 ```{code-cell} ipython3
@@ -829,14 +832,17 @@ To indicate that we are performing K-means clustering, we will create a `KMeans`
 model object. It takes at
 least one argument: the number of clusters `n_clusters`, which we set to 3.
 
+```{index} KMeans;n_clusters
+```
+
 ```{code-cell} ipython3
 from sklearn.cluster import KMeans
 
 kmeans = KMeans(n_clusters=3)
 kmeans
 ```
 
-```{index} scikit-learn;Pipeline, scikit-learn;fit
+```{index} scikit-learn;make_pipeline, scikit-learn;Pipeline, scikit-learn;fit
 ```
 
 To actually run the K-means clustering, we combine the preprocessor and model object
@@ -852,7 +858,7 @@ penguin_clust.fit(penguins)
 penguin_clust
 ```
 
-```{index} KMeans; labels_, KMeans; inertia_, KMeans; cluster_centers_, , KMeans; predict
+```{index} KMeans; labels_, KMeans; inertia_
 ```
 
 The fit `KMeans` object&mdash;which is the second item in the
@@ -907,7 +913,7 @@ The data colored by the cluster assignments returned by K-means.
 ```{index} WSSD; total, KMeans; inertia_
 ```
 
-```{index} see: WSSD; K-means inertia_
+```{index} see: WSSD; KMeans
 ```
 
 As mentioned above,
diff --git a/source/reading.md b/source/reading.md
@@ -1331,7 +1331,7 @@ argument&mdash;the URL of the page to scrape&mdash;and will return a list of
 data frames corresponding to all the tables it finds at that URL. We can see
 below that `read_html` found 17 tables on the Wikipedia page for Canada.
 
-```{index} read function; read_html, read_html
+```{index} read function; read_html
 ```
 
 ```python
diff --git a/source/regression1.md b/source/regression1.md
@@ -391,7 +391,7 @@ about what the data must look like for it to work.
 
 ## Training, evaluating, and tuning the model
 
-```{index} training data, test data
+```{index} training set, test set
 ```
 
 As usual, we must start by putting some test data away in a lock box
diff --git a/source/viz.md b/source/viz.md
@@ -1963,7 +1963,7 @@ bad, while raster images eventually start to look "pixelated."
 ```{index} PDF
 ```
 
-```{index} see: portable document dormat; PDF
+```{index} see: portable document format; PDF
 ```
 
 ```{note}