Skip to content

Commit 7676fb0

Browse files
Merge pull request #248 from UBC-DSCI/dev
Merge latest work in dev into master
2 parents f13a044 + cf9d44a commit 7676fb0

File tree

283 files changed

+7790
-12748
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

283 files changed

+7790
-12748
lines changed

README.md

Lines changed: 136 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,22 @@ We provide instructions for both methods here.
1818

1919
### Without RStudio
2020

21-
Once you are done editing, navigate to the repository root folder and run
21+
To build the **html version** of the book, navigate to the repository root folder and run
2222
```
2323
./build.sh
2424
```
2525
from the command line. This command automatically spawns a docker container
2626
with the `ubcdsci/intro-to-ds` image, runs the script `build.R` from within the container,
27-
and then stops the container.
27+
and then stops the container. It may ask you for a password; this is the password for the
28+
`sudo` command on your computer. Typically this is just your usual computer user account password.
29+
But if your setup doesn't require you to use `sudo` to start a docker container, you can just
30+
open `build.sh` and delete the word `sudo` at the start of the script.
31+
32+
To build the **PDF version** of the book, instead run
33+
```
34+
./pdfbuild.sh
35+
```
36+
The same comments regarding passwords and `sudo` as above apply here.
2837

2938
### With RStudio
3039

@@ -48,23 +57,141 @@ and then stops the container.
4857
- for the username enter `rstudio`
4958
- for the password enter `password` (or whatever you may have changed it to in the `docker run` command above)
5059
51-
> Note, if you prefer not to use RStudio, but a plain text editor instead (i.e., vim) the see [these docs](#usage-without-rstudio) below.
52-
5360
3. Finally, you can render the book by running the following R code in the R console:
5461
```
5562
bookdown::render_book('index.Rmd', 'bookdown::gitbook')
5663
```
5764
58-
### Updating the textbook data
59-
Data sets are collected and curated by `data/retrieve_data.ipynb`. To run that notebook in the Docker container type the following in the terminal:
65+
## Style Guide
6066
67+
#### General
68+
- **80 character line limit!** This is necessary to make git diffs useful
69+
- numbers in text should be english words ("four common mistakes" not "4 common mistakes") unless there are units (40km, not forty km)
70+
- use Oxford commas ("a, b, and c" not "a, b and c")
71+
- "subset" should not be used as a verb
72+
- functions in text should not have parentheses (`read_csv` not `read_csv()`)
73+
- remove all references to "course" and "student"; replace with "reader" or "you" where necessary
74+
- make sure we have permission to use all external resources that we use
75+
- remove all references to "clicking on things" in the HTML version of the book (e.g. "click this link to ...")
76+
- When we introduce a new term, use `**bolding**` to typeset it (but only the first introduction of the term)
77+
78+
#### Code blocks
79+
- Use the knitr label format `##-[name with only alphanumeric + hyphens]` where
80+
the `##` is the 2-digit chapter number, e.g. `03-test-name` for a label `test-name` in chapter 3
81+
- Make sure to get syntax highlighting by specifying the language in each code block:
82+
<pre>
83+
```r
84+
code
85+
```
86+
</pre>
87+
not
88+
<pre>
89+
```
90+
code
91+
```
92+
(similar for `html` where needed)
93+
- always use `|>` pipe, not `%>%`
94+
- anywhere we specify a grid of tuning values, don't just do `grid = 10`; actually specify the values using `seq` or `c(...)`
95+
- do not end code blocks with `head(dataframe)`; just use `dataframe` to print
96+
- `set.seed` once at the beginning of each chapter
97+
- use `"double quotes"` for strings, not `'single quotes'`
98+
- make sure all lines of code are at most 80 characters (for LaTeX PDF output typesetting)
99+
- pass code blocks through `styler` (although must obey the 80ch limit)
100+
- use `slice`, `slice_min`, `slice_max` (not `top_n`)
101+
- just `pull(colname)`, don't `select` first
102+
103+
#### Section headings
104+
- All (sub)section headings should be sentence case ("Loading a tabular data set", not "Loading a Tabular Data Set")
105+
- Make sure that subsections occur in 1-step hierarchies (no subsubsection directly below subsection, for example)
106+
- Make sure that `{-}` is used wherever unnumbered headings are required
107+
108+
Choose an appropriate table of contents depth via (example has depth 2 below, which is a good default)
61109
```
62-
docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds jupyter notebook --ip=0.0.0.0 --allow-root
110+
bookdown::gitbook:
111+
toc_depth: 2
63112
```
64113

65-
## Style Guide
114+
#### Learning objectives
115+
- when saying that students will do things in code, always say "in R"
116+
- "you will be able to" (not "students will be able to", "the reader will be able to")
117+
118+
#### Equations
119+
- make sure all equations get capitalized labels ("Equation \\@ref(blah)", not "equation below" or "equation above")
120+
121+
#### Figures
122+
- make sure all figures get (capitalized) labels ("Figure \\@ref(blah)", not "figure below" or "figure above")
123+
- make sure all figures get captions
124+
- specify image widths in terms of linewidth percent (e.g. `out.width="70%"`)
125+
- center align all images
126+
- make sure we have permission for every figure/logo that we use
127+
- Make sure all figures follow the visualization principles in Chapter 4
128+
- Make sure axes are set appropriately to not inflate/deflate differences artificially *where it does not compromise clarity* (e.g. in the classification
129+
chapter there are a few examples where zoomed-in accuracy axes are better than using the full range 0 to 1)
130+
131+
#### Tables
132+
- make sure all tables get capitalized labels ("Table \\@ref(blah)", not "table below" or "table above")
133+
- make sure all tables get captions
134+
135+
#### Note boxes
136+
- note boxes should be typeset as quote boxes using `>` and start with **Note:**
137+
138+
#### Bibliography
139+
- do not put "et al" or "and others"; always use the full list of authors, BibTeX will choose how to abbreviate
140+
141+
#### Naming conventions
142+
- K-means (not $K$-\*, K means, Kmeans)
143+
- K-nearest neighbors (not $K$-\*, K nearest neighbors, K nearest neighbor, use US spelling neighbor not neighbour). Note that "K-nearest neighbor" is not the singular form; "K-nearest neighbors" is
144+
- K-NN (not $K$-\*, KNN, K NN, $K$NN, K-nn)
145+
- local repository (not local computer)
146+
- package (not library, meta package, meta-package)
147+
- data science (not Data Science)
148+
- data frame (not dataframe)
149+
- data set (not dataset)
150+
- scatter plot (not scatterplot)
151+
- capitalize all initialisms and acronyms (URL not url, API not api, $K$-NN not $k$-nn)
152+
- response variable (not target, output, label)
153+
- predictor variable (not explanatory, feature)
154+
- numerical variable (not quantitative variable)
155+
- categorical variable (not class variable)
156+
157+
#### Punctuation
158+
- emdashes should have no surrounding spaces. `This kind of typesetting&mdash;which is awesome&mdash;is correct!` and `Typesetting with spaces around em-dashes &mdash; which is bad &mdash; is not correct`
159+
- make sure `\index` commands don't break punctuation spacing. E.g. `This is an item \index{item}; it is good` will typeset with an erroneous space after item, i.e. `This is an item ; it is good`
160+
161+
#### Common typos to check for
162+
- RMPSE: should be RMSPE
163+
- boostrap: should be bootstrap
164+
165+
#### Use American spelling
166+
Generally the book uses American spelling. Some common British vs American and Canadian vs American gotchas:
167+
- o vs ou: neighbor and color (not neighbour and colour)
168+
- single vs double ell: labeling and labeled (not labelling and labelled)
169+
- z vs s: summarize (not summarise)
170+
- c vs s: defense (not defence)
171+
- er vs re: center (not centre)
172+
173+
#### PDF Output
174+
These are absolute last steps when rendering the PDF output:
175+
- Look for and fix bad line breaks (e.g. with only one word on the next line, orphans, and widows)
176+
- Look for and fix bad line wraps in code and text
177+
- Look for and fix bad figure placement (falling off page, going over the side)
178+
- Look for `??` in the PDF (broken refs)
179+
- Look in the index for near-duplicates, and merge if needed
180+
- Make sure the 3D figures (and the text around them that refers to clicking and dragging) are properly modified for the PDF output
181+
- Make sure all markdown label-replaced URLs (of the form `[blah](url)`) will make
182+
sense in the hardcopy book version (i.e. nothing like "click this"). Many links appear in the additional resources: make sure the
183+
text-replacement of the URL contains enough information for someone to find the resource (without being able to click the link)
184+
185+
#### HTML Output
186+
- Look for broken references (I *think* these end up as `??`)
187+
- Look for uncentered images
188+
189+
## Updating the textbook data
190+
Data sets are collected and curated by `data/retrieve_data.ipynb`. To run that notebook in the Docker container type the following in the terminal:
66191

67-
- For R code block labels, use the format `##-[name with only alphanumeric + hyphens]` where the `##` is the 2-digit chapter number, e.g. `03-test-name` for a label `test-name` in chapter 3
192+
```
193+
docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds jupyter notebook --ip=0.0.0.0 --allow-root
194+
```
68195

69196
## Repository Organization / Important Files
70197
- The files `index.Rmd` and `##-name.Rmd` are [R-markdown](https://rmarkdown.rstudio.com/) chapter contents to be parsed by [Bookdown](https://bookdown.org/)

_bookdown.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ language:
33
ui:
44
chapter_name: "Chapter "
55
delete_merged_file: true
6-
rmd_files: ["index.Rmd", "intro.Rmd", "reading.Rmd", "wrangling.Rmd", "viz.Rmd", "classification1.Rmd", "classification2.Rmd", "regression1.Rmd", "regression2.Rmd", "clustering.Rmd", "inference.Rmd", "jupyter.Rmd", "version-control.Rmd", "setup.Rmd", "references.Rmd"]
6+
rmd_files: ["index.Rmd", "intro.Rmd", "reading.Rmd", "wrangling.Rmd", "viz.Rmd", "classification1.Rmd", "classification2.Rmd", "regression1.Rmd", "regression2.Rmd", "clustering.Rmd", "inference.Rmd", "jupyter.Rmd", "version-control.Rmd", "setup.Rmd", "appendixA.Rmd", "references.Rmd"]

_pdfbuild.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
bookdown::render_book('index.Rmd', output_format='bookdown::pdf_book')

appendixA.Rmd

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# (APPENDIX) Appendix {-}
2+
3+
# Downloading files from JupyterHub {#appendixA}
4+
5+
This section will help you
6+
save your work from a JupyterHub web-based platform to your own computer.
7+
Let's say you want to download everything inside a folder called `your_folder`
8+
in your home directory.
9+
First open a terminal \index{JupyterHub!file download} by clicking "terminal" in the Launcher tab.
10+
Next, type the following in the terminal to create a
11+
compressed `.zip` archive for the work you are interested in downloading:
12+
13+
```
14+
zip -r hub_folder.zip your_folder
15+
```
16+
17+
After the compressing process is complete, right-click on `hub_folder.zip`
18+
in the JupyterHub file browser
19+
and click "Download". After the download is complete, you should be
20+
able to find the `hub_folder.zip` file on your own computer,
21+
and unzip the file (typically by double-clicking on it).

0 commit comments

Comments
 (0)