Skip to content

Commit e02391a

Browse files
Merge pull request #516 from UBC-DSCI/slice-min-max
slice_min / slice_max
2 parents 9124b80 + 17f49c4 commit e02391a

File tree

3 files changed

+16
-10
lines changed

3 files changed

+16
-10
lines changed

source/classification1.Rmd

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -460,6 +460,7 @@ the $K=5$ neighbors that are nearest to our new point.
460460
You will see in the `mutate` \index{mutate} step below, we compute the straight-line
461461
distance using the formula above: we square the differences between the two observations' perimeter
462462
and concavity coordinates, add the squared differences, and then take the square root.
463+
In order to find the $K=5$ nearest neighbors, we will use the `slice_min` function. \index{slice\_min}
463464

464465
```{r 05-multiknn-1, echo = FALSE, fig.height = 3.5, fig.width = 4.5, fig.pos = "H", out.extra="", fig.cap="Scatter plot of concavity versus perimeter with new observation represented as a red diamond."}
465466
perim_concav <- bind_rows(cancer,
@@ -499,8 +500,7 @@ cancer |>
499500
select(ID, Perimeter, Concavity, Class) |>
500501
mutate(dist_from_new = sqrt((Perimeter - new_obs_Perimeter)^2 +
501502
(Concavity - new_obs_Concavity)^2)) |>
502-
arrange(dist_from_new) |>
503-
slice(1:5) # take the first 5 rows
503+
slice_min(dist_from_new, n = 5) # take the 5 rows of minimum distance
504504
```
505505

506506
In Table \@ref(tab:05-multiknn-mathtable) we show in mathematical detail how
@@ -590,8 +590,7 @@ cancer |>
590590
mutate(dist_from_new = sqrt((Perimeter - new_obs_Perimeter)^2 +
591591
(Concavity - new_obs_Concavity)^2 +
592592
(Symmetry - new_obs_Symmetry)^2)) |>
593-
arrange(dist_from_new) |>
594-
slice(1:5) # take the first 5 rows
593+
slice_min(dist_from_new, n = 5) # take the 5 rows of minimum distance
595594
```
596595

597596
Based on $K=5$ nearest neighbors with these three predictors, we would classify

source/regression1.Rmd

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -233,13 +233,12 @@ sale price might be.
233233
For the example shown in Figure \@ref(fig:07-small-eda-regr),
234234
we find and label the 5 nearest neighbors to our observation
235235
of a house that is 2,000 square feet.
236-
\index{mutate}\index{slice}\index{arrange}\index{abs}
236+
\index{mutate}\index{slice\_min}\index{abs}
237237

238238
```{r 07-find-k3}
239239
nearest_neighbors <- small_sacramento |>
240240
mutate(diff = abs(2000 - sqft)) |>
241-
arrange(diff) |>
242-
slice(1:5) #subset the first 5 rows
241+
slice_min(diff, n = 5)
243242
244243
nearest_neighbors
245244
```

source/viz.Rmd

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -922,10 +922,18 @@ are hard to distinguish, and the names of the landmasses are obscuring each
922922
other as they have been squished into too little space. But remember that the
923923
question we asked was only about the largest landmasses; let's make the plot a
924924
little bit clearer by keeping only the largest 12 landmasses. We do this using
925-
the `slice_max` function. Then to give the labels enough
925+
the `slice_max` function: the `order_by` argument is the name of the column we
926+
want to use for comparing which is largest, and the `n` argument specifies how many
927+
rows to keep. Then to give the labels enough
926928
space, we'll use horizontal bars instead of vertical ones. We do this by
927-
swapping the `x` and `y` variables:
928-
\index{slice\_max}
929+
swapping the `x` and `y` variables.\index{slice\_max}\index{slice\_min}
930+
931+
> **Note:** Recall that in Chapter \@ref(intro), we used `arrange` followed by `slice` to
932+
> obtain the ten rows with the largest values of a variable. We could have instead used
933+
> the `slice_max` function for this purpose. The `slice_max` and `slice_min` functions
934+
> achieve the same goal as `arrange` followed by `slice`, but are slightly more efficient
935+
> because they are specialized for this purpose. In general, it is good to use more specialized
936+
> functions when they are available!
929937
930938
```{r 03-data-islands-bar-2, warning=FALSE, message=FALSE, fig.width=5, fig.height=2.75, fig.align = "center", fig.pos = "H", out.extra="", fig.cap = "Bar plot of size for Earth's largest 12 landmasses."}
931939
islands_top12 <- slice_max(islands_df, order_by = size, n = 12)

0 commit comments

Comments
 (0)