You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: authors.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,4 +6,4 @@ Tiffany Timbers is an Assistant Professor of Teaching in the Department of Stati
6
6
Trevor Campbell is an Assistant Professor in the Department of Statistics at the University of British Columbia. His research focuses on automated, scalable Bayesian inference algorithms, Bayesian nonparametrics, streaming data, and Bayesian theory. He was previously a postdoctoral associate advised by Tamara Broderick in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute for Data, Systems, and Society (IDSS) at MIT, a Ph.D. candidate under Jonathan How in the Laboratory for Information and Decision Systems (LIDS) at MIT, and before that he was in the Engineering Science program at the University of Toronto.
7
7
8
8
9
-
Melissa Lee is an Assistant Professor of Teaching in the Department of Statistics at the University of British Columbia. With a focus on teaching, she develops curriculum for undergraduate statistics and data science courses. She enjoys using student-centered approaches, developing and assessing open educational resources, and promoting equity, diversity, and inclusion initiatives.
9
+
Melissa Lee is an Assistant Professor of Teaching in the Department of Statistics at the University of British Columbia. She teaches and develops curriculum for undergraduate statistics and data science courses. Her work focuses on student-centered approaches to teaching, developing and assessing open educational resources, and promoting equity, diversity, and inclusion initiatives.
Copy file name to clipboardExpand all lines: classification1.Rmd
+4-6Lines changed: 4 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,9 @@ library(knitr)
7
7
8
8
knitr::opts_chunk$set(echo = TRUE,
9
9
fig.align = "center")
10
+
options(knitr.table.format = function() {
11
+
if (knitr::is_latex_output()) 'latex' else 'pandoc'
12
+
})
10
13
```
11
14
12
15
## Overview
@@ -565,7 +568,7 @@ Based on $K=5$ nearest neighbors with these three predictors we would classify t
565
568
Figure \@ref(fig:05-more) shows what the data look like when we visualize them
566
569
as a 3-dimensional scatter with lines from the new observation to its five nearest neighbors.
567
570
568
-
```{r 05-more, echo = FALSE, message = FALSE, fig.cap = "3D scatter plot of the standardized symmetry, concavity, and perimeter variables.", fig.retina=2}
571
+
```{r 05-more, echo = FALSE, message = FALSE, fig.cap = "3D scatter plot of the standardized symmetry, concavity, and perimeter variables. Note that in general we recommend against using 3D visualizations; here we show the data in 3D only to illustrate what higher dimensions and nearest neighbors look like, for learning purposes.", fig.retina=2, out.width="80%"}
569
572
attrs <- c("Perimeter", "Concavity", "Symmetry")
570
573
571
574
# create new scaled obs and get NNs
@@ -638,11 +641,6 @@ if(!is_latex_output()){
638
641
}
639
642
```
640
643
641
-
*Click and drag the plot above to rotate it, and scroll to zoom. Note that in
642
-
general we recommend against using 3D visualizations; here we show the data in
643
-
3D only to illustrate what "higher dimensions" and "nearest neighbors" look like,
644
-
for learning purposes.*
645
-
646
644
### Summary of $K$-nearest neighbors algorithm
647
645
648
646
In order to classify a new observation using a $K$-nearest neighbor classifier, we have to:
```{r 10-toy-kmeans-iter, echo = FALSE, warning = FALSE, fig.height = 16, fig.width = 8, fig.cap = "First four iterations of K-means clustering on the `penguin_data` example data set. Each row corresponds to an iteration, where the left column depicts the center update, and the right column depicts the reassignment of data to clusters. Cluster centers are indicated by larger points that are outlined in black."}
Figure \@ref(fig:10-toy-kmeans-bad-iter) shows what the iterations of K-means would look like with the unlucky random initialization shown in Figure \@ref(fig:10-toy-kmeans-bad-init).
| Descriptive | A question that asks about summarized characteristics of a data set without interpretation (i.e., report a fact). | How many people live in each province and territory in Canada? |
100
100
| Exploratory | A question asks if there are patterns, trends, or relationships within a single data set. Often used to propose hypotheses for future study. | Does political party voting change with indicators of wealth in a set of data collected on 2,000 people living in Canada? |
101
101
| Predictive | A question that asks about predicting measurements or labels for individuals (people or things). The focus is on what things predict some outcome, but not what causes the outcome. | What political party will someone vote for in the next Canadian election? |
@@ -253,7 +253,9 @@ file satisfies everything else that the `read_csv` function expects in the defau
253
253
use-case. Figure \@ref(fig:img-read-csv) describes how we use the `read_csv`
> time, a single expression in R must be contained in a single line of code.
517
528
> However, there *are* a small number of situations in which you can have a
518
529
> single R expression span multiple lines. Above is one such case: here, R knows that a line cannot
519
-
> end with a `+` symbol, \index{plussymb@$+$} and so it keeps reading the next line to figure out
530
+
> end with a `+` symbol, \index{aaaplussymb@$+$|see{ggplot (add layer)}} and so it keeps reading the next line to figure out
520
531
> what the right hand side of the `+` symbol should be. We could, of course,
521
532
> put all of the added layers on one line of code, but splitting them across
522
533
> multiple lines helps a lot with code readability. \index{multi-line expression}
@@ -591,7 +602,7 @@ were, according to the 2016 Candian census, and how many people speak each of th
591
602
instance, we can see that the Aboriginal language most often reported was Cree
592
603
n.o.s. with over 60,000 Canadian residents reporting it as their mother tongue.
593
604
594
-
> "n.o.s." means "not otherwise specified", so Cree n.o.s. refers to
605
+
> **Note:**"n.o.s." means "not otherwise specified", so Cree n.o.s. refers to
595
606
> individuals who reported Cree as their mother tongue. In this data set, the
596
607
> Cree languages include the following categories: Cree n.o.s., Swampy Cree,
597
608
> Plains Cree, Woods Cree, and a 'Cree not included elsewhere' category (which
@@ -609,7 +620,7 @@ grey to white to improve the contrast. We have also actually skipped the
609
620
in the `ggplot` function, you don't actually need to `select` the columns in advance
610
621
when creating a visualization. And finally, we provided *comments* next to
611
622
many of the lines of code below using the
612
-
hash symbol `#`. When R sees a `#` sign, \index{comment} \index{commentsymb@\#|see{comment}} it
623
+
hash symbol `#`. When R sees a `#` sign, \index{comment} \index{aaacommentsymb@\#|see{comment}} it
613
624
will ignore all of the text that
614
625
comes after the symbol on that line. So you can use comments to explain lines
615
626
of code for others, and perhaps more importantly, your future self!
@@ -650,7 +661,7 @@ There are many R functions in the `tidyverse` package (and beyond!), and
650
661
nobody can be expected to remember what every one of them does
651
662
nor all of the arguments we have to give them. Fortunately R provides
652
663
the `?` symbol, which
653
-
\index{questionmark@? symbol|see{documentation}}
664
+
\index{aaaquestionmark@?|see{documentation}}
654
665
\index{help|see{documentation}}
655
666
\index{documentation} provides an easy way to pull up the documentation for
656
667
most functions quickly. To use the `?` symbol to access documentation, you
@@ -672,6 +683,6 @@ documentation like that shown in Figure \@ref(fig:01-help). But do keep in mind
672
683
is not written to *teach* you about a function; it is just there as a reference to *remind*
673
684
you about the different arguments and usage of functions that you have already learned about elsewhere.
674
685
675
-
```{r 01-help, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "The documentation for the `filter` function, including a high-level description, a list of arguments and their meanings, and more.", fig.retina = 2}
686
+
```{r 01-help, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "The documentation for the `filter` function, including a high-level description, a list of arguments and their meanings, and more.", fig.retina = 2, out.width="100%"}
0 commit comments