fixxing spacing between words, pages etc

leem44 · leem44 · commit f9649b1aec84 · 2022-04-17T21:16:38.000-07:00
diff --git a/classification2.Rmd b/classification2.Rmd
@@ -120,7 +120,7 @@ in the analysis, would we not get a different result each time?
 The trick is that in R&mdash;and other programming languages&mdash;randomness 
 is not actually random! Instead, R uses a *random number generator* that
 produces a sequence of numbers that
-are completely determined by a \index{seed} \index{random seed|see{seed}}
+are completely determined by a\index{seed} \index{random seed|see{seed}}
  *seed value*. Once you set the seed value 
 using the \index{seed!set.seed} `set.seed` function, everything after that point may *look* random,
 but is actually totally reproducible. As long as you pick the same seed
@@ -157,6 +157,7 @@ set.seed(1)
 random_numbers <- sample(0:9, 10, replace=TRUE)
 random_numbers
 
+set.seed(1)
 random_numbers <- sample(0:9, 10, replace=TRUE)
 random_numbers
 ```
@@ -170,6 +171,7 @@ set.seed(4235)
 random_numbers <- sample(0:9, 10, replace=TRUE)
 random_numbers
 
+set.seed(4235)
 random_numbers <- sample(0:9, 10, replace=TRUE)
 random_numbers
 ```
@@ -323,7 +325,7 @@ our test data does not influence any aspect of our model training. Once we have
 created the standardization preprocessor, we can then apply it separately to both the
 training and test data sets.
 
-Fortunately, the `recipe` framework from `tidymodels` helps us handle \index{recipe}\index{recipe!step\_scale}\index{recipe!step\_center}
+Fortunately, the `recipe` framework from `tidymodels` helps us handle\index{recipe}\index{recipe!step\_scale}\index{recipe!step\_center}
 this properly. Below we construct and prepare the recipe using only the training
 data (due to `data = cancer_train` in the first line).
 
@@ -411,7 +413,6 @@ the table of predicted labels and correct labels, using the `conf_mat` function:
 ```{r 06-confusionmat}
 confusion <- cancer_test_predictions |>
              conf_mat(truth = Class, estimate = .pred_class)
-
 confusion
 ```
 
@@ -497,7 +498,7 @@ for the application.
 ## Tuning the classifier
 
 The vast majority of predictive models in statistics and machine learning have
-*parameters*. A *parameter* \index{parameter}\index{tuning parameter|see{parameter}}
+*parameters*. A *parameter*\index{parameter}\index{tuning parameter|see{parameter}}
 is a number you have to pick in advance that determines
 some aspect of how the model behaves. For example, in the $K$-nearest neighbors
 classification algorithm, $K$ is a parameter that we have to pick
@@ -663,7 +664,7 @@ cancer_vfold <- vfold_cv(cancer_train, v = 5, strata = Class)
 cancer_vfold
 ```
 
-Then, when we create our data analysis workflow, we use the `fit_resamples` function \index{cross-validation!fit\_resamples}\index{tidymodels!fit\_resamples}
+Then, when we create our data analysis workflow, we use the `fit_resamples` function\index{cross-validation!fit\_resamples}\index{tidymodels!fit\_resamples}
 instead of the `fit` function for training. This runs cross-validation on each
 train/validation split. 
 
@@ -689,7 +690,7 @@ knn_fit <- workflow() |>
 knn_fit
 ```
 
-The `collect_metrics` \index{tidymodels!collect\_metrics}\index{cross-validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
+The `collect_metrics`\index{tidymodels!collect\_metrics}\index{cross-validation!collect\_metrics} function is used to aggregate the *mean* and *standard error*
 of the classifier's validation accuracy across the folds. You will find results
 related to the accuracy in the row with `accuracy` listed under the `.metric` column. 
 You should consider the mean (`mean`) to be the estimated accuracy, while the standard