bugfixing index

trevorcampbell · trevorcampbell · commit 4a67b26c818a · 2023-11-16T14:33:27.000-08:00
diff --git a/source/classification1.md b/source/classification1.md
@@ -183,7 +183,7 @@ total set of variables per image in this data set is:
 
 +++
 
-```{index} pandas.DataFrame; info
+```{index} DataFrame; info
 ```
 
 Below we use the `info` method to preview the data frame. This method can
@@ -195,7 +195,7 @@ as well as their data types and the number of non-missing entries.
 cancer.info()
 ```
 
-```{index} pandas.Series; unique
+```{index} Series; unique
 ```
 
 From the summary of the data above, we can see that `Class` is of type `object`.
@@ -213,7 +213,7 @@ method. The `replace` method takes one argument: a dictionary that maps
 previous values to desired new values.
 We will verify the result using the `unique` method.
 
-```{index} pandas.Series; replace
+```{index} Series; replace
 ```
 
 ```{code-cell} ipython3
@@ -227,7 +227,7 @@ cancer["Class"].unique()
 
 ### Exploring the cancer data
 
-```{index} pandas.DataFrame; groupby, pandas.Series;size
+```{index} DataFrame; groupby, Series;size
 ```
 
 ```{code-cell} ipython3
@@ -256,7 +256,7 @@ tumor observations.
 100 * cancer.groupby("Class").size() / cancer.shape[0]
 ```
 
-```{index} pandas.Series; value_counts
+```{index} Series; value_counts
 ```
 
 The `pandas` package also has a more convenient specialized `value_counts` method for
@@ -1607,7 +1607,7 @@ Imbalanced data with background color indicating the decision of the classifier
 
 +++
 
-```{index} oversampling, pandas.DataFrame; sample
+```{index} oversampling, DataFrame; sample
 ```
 
 Despite the simplicity of the problem, solving it in a statistically sound manner is actually
diff --git a/source/clustering.md b/source/clustering.md
@@ -308,7 +308,7 @@ have.
 clus = penguins_clustered[penguins_clustered["cluster"] == 0][["bill_length_standardized", "flipper_length_standardized"]]
 ```
 
-```{index} see: within-cluster sum-of-squared-distances; WSSD
+```{index} see: within-cluster sum of squared distances; WSSD
 ```
 
 ```{index} WSSD
diff --git a/source/inference.md b/source/inference.md
@@ -168,7 +168,7 @@ We can find the proportion of listings for each room type
 by using the `value_counts` function with the `normalize` parameter
 as we did in previous chapters.
 
-```{index} pandas.DataFrame; [], pandas.DataFrame; value_counts
+```{index} DataFrame; [], DataFrame; value_counts
 ```
 
 ```{code-cell} ipython3
@@ -187,13 +187,13 @@ value, {glue:text}`population_proportion`, is the population parameter. Remember
 parameter value is usually unknown in real data analysis problems, as it is
 typically not possible to make measurements for an entire population.
 
-```{index} pandas.DataFrame; sample, seed;numpy.random.seed
+```{index} DataFrame; sample, seed;numpy.random.seed
 ```
 
 Instead, perhaps we can approximate it with a small subset of data!
 To investigate this idea, let's try randomly selecting 40 listings (*i.e.,* taking a random sample of
 size 40 from our population), and computing the proportion for that sample.
-We will use the `sample` method of the `pandas.DataFrame`
+We will use the `sample` method of the `DataFrame`
 object to take the sample. The argument `n` of `sample` is the size of the sample to take
 and since we are starting to use randomness here,
 we are also setting the random seed via numpy to make the results reproducible.
@@ -213,7 +213,7 @@ airbnb.sample(n=40)["room_type"].value_counts(normalize=True)
 glue("sample_1_proportion", "{:.3f}".format(airbnb.sample(n=40, random_state=155)["room_type"].value_counts(normalize=True)["Entire home/apt"]))
 ```
 
-```{index} pandas.DataFrame; value_counts
+```{index} DataFrame; value_counts
 ```
 
 Here we see that the proportion of entire home/apartment listings in this
@@ -248,7 +248,7 @@ commonly refer to as $n$) from a population is called
 a **sampling distribution**. The sampling distribution will help us see how much we would
 expect our sample proportions from this population to vary for samples of size 40.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 We again use the `sample` to take samples of size 40 from our
@@ -284,7 +284,7 @@ to compute the number of qualified observations in each sample; finally compute
 Both the first and last few entries of the resulting data frame are printed
 below to show that we end up with 20,000 point estimates, one for each of the 20,000 samples.
 
-```{index} pandas.DataFrame;groupby, pandas.DataFrame;reset_index
+```{index} DataFrame;groupby, DataFrame;reset_index
 ```
 
 ```{code-cell} ipython3
@@ -479,7 +479,7 @@ The price per night of all Airbnb rentals in Vancouver, BC
 is \${glue:text}`population_mean`, on average. This value is our
 population parameter since we are calculating it using the population data.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 Now suppose we did not have access to the population data (which is usually the
@@ -987,7 +987,7 @@ mean of the sample is \${glue:text}`estimate_mean`.
 Remember, in practice, we usually only have this one sample from the population. So
 this sample and estimate are the only data we can work with.
 
-```{index} bootstrap; in Python, pandas.DataFrame; sample (bootstrap)
+```{index} bootstrap; in Python, DataFrame; sample (bootstrap)
 ```
 
 We now perform steps 1&ndash;5 listed above to generate a single bootstrap
@@ -1106,7 +1106,7 @@ generate a bootstrap distribution of these point estimates. The bootstrap
 distribution ({numref}`fig:11-bootstrapping5`) suggests how we might expect
 our point estimate to behave if we take multiple samples.
 
-```{index} pandas.DataFrame;reset_index, pandas.DataFrame;rename, pandas.DataFrame;groupby, pandas.Series;mean
+```{index} DataFrame;reset_index, DataFrame;rename, DataFrame;groupby, Series;mean
 ```
 
 ```{code-cell} ipython3
@@ -1252,7 +1252,7 @@ Quantiles are expressed in proportions rather than percentages,
 so the 2.5th and 97.5th percentiles
 would be the 0.025 and 0.975 quantiles, respectively.
 
-```{index} pandas.DataFrame; [], pandas.DataFrame;quantile
+```{index} DataFrame; [], DataFrame;quantile
 ```
 
 ```{index} percentile
diff --git a/source/intro.md b/source/intro.md
@@ -437,13 +437,13 @@ can_lang
 
 ## Creating subsets of data frames with `[]` & `loc[]`
 
-```{index} see: []; pandas.DataFrame
+```{index} see: []; DataFrame
 ```
 
-```{index} see: loc[]; pandas.DataFrame
+```{index} see: loc[]; DataFrame
 ```
 
-```{index} pandas.DataFrame; [], pandas.DataFrame; loc[], selecting columns
+```{index} DataFrame; [], DataFrame; loc[], selecting columns
 ```
 
 Now that we've loaded our data into Python, we can start wrangling the data to
@@ -475,7 +475,7 @@ high-level categories of languages, which include "Aboriginal languages",
 our question we want to filter our data set so we restrict our attention
 to only those languages in the "Aboriginal languages" category.
 
-```{index} pandas.DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
+```{index} DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
 ```
 
 We can use the `[]` operation to obtain the subset of rows with desired values
@@ -521,7 +521,7 @@ can_lang[can_lang["category"] == "Aboriginal languages"]
 ### Using `[]` to select columns
 
 
-```{index} pandas.DataFrame; [], selecting columns
+```{index} DataFrame; [], selecting columns
 ```
 
 We can also use the `[]` operation to select columns from a data frame.
@@ -551,7 +551,7 @@ can_lang[["language", "mother_tongue"]]
 
 ### Using `loc[]` to filter rows and select columns
 
-```{index} pandas.DataFrame; loc[], selecting columns
+```{index} DataFrame; loc[], selecting columns
 ```
 
 The `[]` operation is only used when you want to filter rows *or* select columns;
@@ -612,7 +612,7 @@ So it looks like the `loc[]` operation gave us the result we wanted!
 
 ## Using `sort_values` and `head` to select rows by ordered values
 
-```{index} pandas.DataFrame; sort_values, pandas.DataFrame; head
+```{index} DataFrame; sort_values, DataFrame; head
 ```
 
 We have used the `[]` and `loc[]` operations on a data frame to obtain a table
diff --git a/source/reading.md b/source/reading.md
@@ -407,7 +407,7 @@ canlang_data = pd.read_csv(
 canlang_data
 ```
 
-```{index} pandas.DataFrame; rename, pandas
+```{index} DataFrame; rename, pandas
 ```
 
 It is best to rename your columns manually in this scenario. The current column names
@@ -790,7 +790,7 @@ that we need for analysis; we do eventually need to call `execute`.
 For example, `ibis` does not provide the `tail` function to look at the last
 rows in a database, even though `pandas` does.
 
-```{index} pandas.DataFrame; tail
+```{index} DataFrame; tail
 ```
 
 ```{code-cell} ipython3
@@ -951,7 +951,7 @@ Databases are beneficial in a large-scale setting:
 
 ## Writing data from Python to a `.csv` file
 
-```{index} write function; to_csv, pandas.DataFrame; to_csv
+```{index} write function; to_csv, DataFrame; to_csv
 ```
 
 At the middle and end of a data analysis, we often want to write a data frame
diff --git a/source/regression1.md b/source/regression1.md
@@ -233,7 +233,7 @@ how well it predicts house sale price. This subsample is taken to allow us to
 illustrate the mechanics of K-NN regression with a few data points; later in
 this chapter we will use all the data.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 To take a small random sample of size 30, we'll use the
@@ -287,7 +287,7 @@ Scatter plot of price (USD) versus house size (square feet) with vertical line i
 
 +++
 
-```{index} pandas.DataFrame; abs, pandas.DataFrame; nsmallest
+```{index} DataFrame; abs, DataFrame; nsmallest
 ```
 
 We will employ the same intuition from {numref}`Chapters %s <classification1>` and {numref}`%s <classification2>`, and use the
diff --git a/source/viz.md b/source/viz.md
@@ -718,7 +718,7 @@ in the magnitude of these two numbers!
 We can confirm that the two points in the upper right-hand corner correspond
 to Canada's two official languages by filtering the data:
 
-```{index} pandas.DataFrame; loc[]
+```{index} DataFrame; loc[]
 ```
 
 ```{code-cell} ipython3
@@ -848,7 +848,7 @@ using `_` so that it is easier to read;
 this does not affect how Python interprets the number
 and is just added for readability.
 
-```{index} pandas.DataFrame; column assignment, pandas.DataFrame; []
+```{index} DataFrame; column assignment, DataFrame; []
 ```
 
 ```{code-cell} ipython3
@@ -1228,7 +1228,7 @@ as `sort_values` followed by `head`, but are slightly more efficient because the
 In general, it is good to use more specialized functions when they are available!
 ```
 
-```{index} pandas.DataFrame; nlargest, pandas.DataFrame; nsmallest
+```{index} DataFrame; nlargest, DataFrame; nsmallest
 ```
 
 ```{code-cell} ipython3
diff --git a/source/wrangling.md b/source/wrangling.md