UBC-DSCI
diff --git a/‎README.md
Lines changed: 136 additions & 9 deletions b/‎README.md
Lines changed: 136 additions & 9 deletions
diff --git a/‎_bookdown.yml
Lines changed: 1 addition & 1 deletion b/‎_bookdown.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎_pdfbuild.R
Lines changed: 1 addition & 0 deletions b/‎_pdfbuild.R
Lines changed: 1 addition & 0 deletions
diff --git a/‎appendixA.Rmd
Lines changed: 21 additions & 0 deletions b/‎appendixA.Rmd
Lines changed: 21 additions & 0 deletions
@@ -18,13 +18,22 @@ We provide instructions for both methods here.
 
 ### Without RStudio
 
-Once you are done editing, navigate to the repository root folder and run
+To build the **html version** of the book, navigate to the repository root folder and run
 ```
 ./build.sh
 ``` 
 from the command line. This command automatically spawns a docker container
 with the `ubcdsci/intro-to-ds` image, runs the script `build.R` from within the container,
-and then stops the container.
+and then stops the container. It may ask you for a password; this is the password for the
+`sudo` command on your computer. Typically this is just your usual computer user account password.
+But if your setup doesn't require you to use `sudo` to start a docker container, you can just
+open `build.sh` and delete the word `sudo` at the start of the script.
+
+To build the **PDF version** of the book, instead run
+```
+./pdfbuild.sh
+```
+The same comments regarding passwords and `sudo` as above apply here.
 
 ### With RStudio
 
@@ -48,23 +57,141 @@ and then stops the container.
     - for the username enter `rstudio` 
     - for the password enter `password` (or whatever you may have changed it to in the `docker run` command above)
     
-    > Note, if you prefer not to use RStudio, but a plain text editor instead (i.e., vim) the see [these docs](#usage-without-rstudio) below.
-
 3. Finally, you can render the book by running the following R code in the R console:
     ```
     bookdown::render_book('index.Rmd', 'bookdown::gitbook')
     ```
 
-### Updating the textbook data
-Data sets are collected and curated by `data/retrieve_data.ipynb`. To run that notebook in the Docker container type the following in the terminal:
+## Style Guide
 
+#### General
+- **80 character line limit!** This is necessary to make git diffs useful
+- numbers in text should be english words ("four common mistakes" not "4 common mistakes") unless there are units (40km, not forty km)
+- use Oxford commas ("a, b, and c" not "a, b and c")
+- "subset" should not be used as a verb
+- functions in text should not have parentheses (`read_csv` not `read_csv()`)
+- remove all references to "course" and "student"; replace with "reader" or "you" where necessary
+- make sure we have permission to use all external resources that we use
+- remove all references to "clicking on things" in the HTML version of the book (e.g. "click this link to ...")
+- When we introduce a new term, use `**bolding**` to typeset it (but only the first introduction of the term)
+
+#### Code blocks
+- Use the knitr label format `##-[name with only alphanumeric + hyphens]` where 
+  the `##` is the 2-digit chapter number, e.g. `03-test-name` for a label `test-name` in chapter 3
+- Make sure to get syntax highlighting by specifying the language in each code block:
+  <pre>
+  ```r
+     code
+  ```
+  </pre>
+  not
+  <pre>
+  ```
+    code
+  ```
+  (similar for `html` where needed)
+- always use `|>` pipe, not `%>%`
+- anywhere we specify a grid of tuning values, don't just do `grid = 10`; actually specify the values using `seq` or `c(...)`
+- do not end code blocks with `head(dataframe)`; just use `dataframe` to print
+- `set.seed` once at the beginning of each chapter
+- use `"double quotes"` for strings, not `'single quotes'`
+- make sure all lines of code are at most 80 characters (for LaTeX PDF output typesetting)
+- pass code blocks through `styler` (although must obey the 80ch limit)
+- use `slice`, `slice_min`, `slice_max` (not `top_n`)
+- just `pull(colname)`, don't `select` first
+
+#### Section headings
+- All (sub)section headings should be sentence case ("Loading a tabular data set", not "Loading a Tabular Data Set")
+- Make sure that subsections occur in 1-step hierarchies (no subsubsection directly below subsection, for example)
+- Make sure that `{-}` is used wherever unnumbered headings are required
+
+Choose an appropriate table of contents depth via (example has depth 2 below, which is a good default)
 ```
-docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds jupyter notebook --ip=0.0.0.0 --allow-root
+bookdown::gitbook:
+    toc_depth: 2
 ```
 
-## Style Guide
+#### Learning objectives
+- when saying that students will do things in code, always say "in R"
+- "you will be able to" (not "students will be able to", "the reader will be able to")
+
+#### Equations
+- make sure all equations get capitalized labels ("Equation \\@ref(blah)", not "equation below" or "equation above")
+
+#### Figures
+- make sure all figures get (capitalized) labels ("Figure \\@ref(blah)", not "figure below" or "figure above")
+- make sure all figures get captions
+- specify image widths in terms of linewidth percent (e.g. `out.width="70%"`)
+- center align all images
+- make sure we have permission for every figure/logo that we use
+- Make sure all figures follow the visualization principles in Chapter 4
+- Make sure axes are set appropriately to not inflate/deflate differences artificially *where it does not compromise clarity* (e.g. in the classification
+  chapter there are a few examples where zoomed-in accuracy axes are better than using the full range 0 to 1)
+
+#### Tables
+- make sure all tables get capitalized labels ("Table \\@ref(blah)", not "table below" or "table above")
+- make sure all tables get captions
+
+#### Note boxes
+- note boxes should be typeset as quote boxes using `>` and start with **Note:**
+
+#### Bibliography
+- do not put "et al" or "and others"; always use the full list of authors, BibTeX will choose how to abbreviate
+
+#### Naming conventions
+- K-means (not $K$-\*, K means, Kmeans)
+- K-nearest neighbors (not $K$-\*, K nearest neighbors, K nearest neighbor, use US spelling neighbor not neighbour). Note that "K-nearest neighbor" is not the singular form; "K-nearest neighbors" is
+- K-NN (not $K$-\*, KNN, K NN, $K$NN, K-nn)
+- local repository (not local computer)
+- package (not library, meta package, meta-package)
+- data science (not Data Science)
+- data frame (not dataframe)
+- data set (not dataset)
+- scatter plot (not scatterplot)
+- capitalize all initialisms and acronyms (URL not url, API not api, $K$-NN not $k$-nn)
+- response variable (not target, output, label)
+- predictor variable (not explanatory, feature)
+- numerical variable (not quantitative variable)
+- categorical variable (not class variable)
+
+#### Punctuation
+- emdashes should have no surrounding spaces. `This kind of typesetting&mdash;which is awesome&mdash;is correct!` and `Typesetting with spaces around em-dashes &mdash; which is bad &mdash; is not correct`
+- make sure `\index` commands don't break punctuation spacing. E.g. `This is an item \index{item}; it is good` will typeset with an erroneous space after item, i.e. `This is an item ; it is good`
+
+#### Common typos to check for
+- RMPSE: should be RMSPE
+- boostrap: should be bootstrap
+
+#### Use American spelling
+Generally the book uses American spelling. Some common British vs American and Canadian vs American gotchas:
+- o vs ou: neighbor and color (not neighbour and colour)
+- single vs double ell: labeling and labeled (not labelling and labelled)
+- z vs s: summarize (not summarise)
+- c vs s: defense (not defence)
+- er vs re: center (not centre)
+
+#### PDF Output
+These are absolute last steps when rendering the PDF output:
+- Look for and fix bad line breaks (e.g. with only one word on the next line, orphans, and widows)
+- Look for and fix bad line wraps in code and text
+- Look for and fix bad figure placement (falling off page, going over the side)
+- Look for `??` in the PDF (broken refs)
+- Look in the index for near-duplicates, and merge if needed
+- Make sure the 3D figures (and the text around them that refers to clicking and dragging) are properly modified for the PDF output
+- Make sure all markdown label-replaced URLs (of the form `[blah](url)`) will make 
+  sense in the hardcopy book version (i.e. nothing like "click this"). Many links appear in the additional resources: make sure the 
+  text-replacement of the URL contains enough information for someone to find the resource (without being able to click the link)
+
+#### HTML Output
+- Look for broken references (I *think* these end up as `??`)
+- Look for uncentered images
+
+## Updating the textbook data
+Data sets are collected and curated by `data/retrieve_data.ipynb`. To run that notebook in the Docker container type the following in the terminal:
 
-- For R code block labels, use the format `##-[name with only alphanumeric + hyphens]` where the `##` is the 2-digit chapter number, e.g. `03-test-name` for a label `test-name` in chapter 3
+```
+docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds jupyter notebook --ip=0.0.0.0 --allow-root
+```
 
 ## Repository Organization / Important Files
 - The files `index.Rmd` and `##-name.Rmd` are [R-markdown](https://rmarkdown.rstudio.com/) chapter contents to be parsed by [Bookdown](https://bookdown.org/)
 
@@ -3,4 +3,4 @@ language:
   ui:
     chapter_name: "Chapter "
 delete_merged_file: true
-rmd_files: ["index.Rmd", "intro.Rmd", "reading.Rmd", "wrangling.Rmd", "viz.Rmd", "classification1.Rmd", "classification2.Rmd", "regression1.Rmd", "regression2.Rmd", "clustering.Rmd", "inference.Rmd", "jupyter.Rmd", "version-control.Rmd", "setup.Rmd", "references.Rmd"]
+rmd_files: ["index.Rmd", "intro.Rmd", "reading.Rmd", "wrangling.Rmd", "viz.Rmd", "classification1.Rmd", "classification2.Rmd", "regression1.Rmd", "regression2.Rmd", "clustering.Rmd", "inference.Rmd", "jupyter.Rmd", "version-control.Rmd", "setup.Rmd", "appendixA.Rmd", "references.Rmd"]
@@ -0,0 +1 @@
+bookdown::render_book('index.Rmd', output_format='bookdown::pdf_book')
@@ -0,0 +1,21 @@
+# (APPENDIX) Appendix {-}
+
+# Downloading files from JupyterHub {#appendixA}
+
+This section will help you
+save your work from a JupyterHub web-based platform to your own computer. 
+Let's say you want to download everything inside a folder called `your_folder`
+in your home directory.
+First open a terminal \index{JupyterHub!file download} by clicking "terminal" in the Launcher tab. 
+Next, type the following in the terminal to create a 
+compressed `.zip` archive for the work you are interested in downloading:
+
+```
+zip -r hub_folder.zip your_folder
+```
+
+After the compressing process is complete, right-click on `hub_folder.zip`
+in the JupyterHub file browser
+and click "Download". After the download is complete, you should be 
+able to find the `hub_folder.zip` file on your own computer,
+and unzip the file (typically by double-clicking on it).
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+bookdown::render_book('index.Rmd', output_format='bookdown::pdf_book')`