Skip to content

Commit dd54b5b

Browse files
Merge pull request #340 from UBC-DSCI/dev
Dev
2 parents 0f9ce4d + b9e8a1e commit dd54b5b

File tree

168 files changed

+5850
-3186
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

168 files changed

+5850
-3186
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,6 @@
77
_bookdown_files
88
**.ipynb_checkpoints
99
.rstudio/*
10+
pdf/_book
11+
pdf/_bookdown_files
12+
pdf/*.log

LICENSE.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,21 @@
1-
**CC BY 2.5 CA**
1+
# License
22

3-
An Introduction to Data Science is
4-
made available under the **Attribution-NonCommercial-ShareAlike 4.0 International** ([CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)).
3+
This textbook is made available under the **Attribution-NonCommercial-ShareAlike 4.0 International** ([CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)).
54

65
This is a human-readable summary of (and not a substitute for) the [license](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
76

87
## You are free to:
9-
**Share** — copy and redistribute the material in any medium or format
10-
**Adapt** — remix, transform, and build upon the material for any purpose, even commercially.
8+
9+
- **Share** — copy and redistribute the material in any medium or format
10+
- **Adapt** — remix, transform, and build upon the material
1111

1212
The licensor cannot revoke these freedoms as long as you follow the license terms.
1313

1414
## Under the following terms:
1515

16-
**Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
17-
18-
**NonCommercial** — You may not use the material for commercial purposes.
19-
20-
**ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
16+
- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
17+
- **NonCommercial** — You may not use the material for commercial purposes.
18+
- **ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
2119

2220
**No additional restrictions** — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
2321

README.md

Lines changed: 35 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
1-
## Introduction to Data Science
2-
This is the source for the Introduction to Data Science textbook.
1+
## Data Science: A First Introduction
2+
This is the source for the *Data Science: A First Introduction* textbook.
3+
4+
## License Information
5+
6+
This textbook is offered under
7+
the [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License](https://creativecommons.org/licenses/by-nc-sa/4.0/).
8+
See [the license file](LICENSE.md) for more information.
39

410
## Setup and Build
511

@@ -20,22 +26,19 @@ We provide instructions for both methods here.
2026

2127
To build the **html version** of the book, navigate to the repository root folder and run
2228
```
23-
./build.sh
29+
./build_html.sh
2430
```
2531
from the command line. This command automatically spawns a docker container
26-
with the `ubcdsci/intro-to-ds` image, runs the script `build.R` from within the container,
27-
and then stops the container. It may ask you for a password; this is the password for the
28-
`sudo` command on your computer. Typically this is just your usual computer user account password.
29-
But if your setup doesn't require you to use `sudo` to start a docker container, you can just
30-
open `build.sh` and delete the word `sudo` at the start of the script.
32+
with the `ubcdsci/intro-to-ds` image, runs the script `_build_html.r` from within the container,
33+
and then stops the container.
3134

3235
To build the **PDF version** of the book, instead run
3336
```
34-
./pdfbuild.sh
37+
./build_pdf.sh
3538
```
36-
The same comments regarding passwords and `sudo` as above apply here.
39+
This command again spawns a docker container and runs `pdf/_build_pdf.r` inside the container.
3740

38-
### With RStudio
41+
### With RStudio (HTML only)
3942

4043
1. Run RStudio inside the `ubcdsci/intro-to-ds` docker container:
4144
- in terminal, navigate to the root of this project repo
@@ -120,23 +123,39 @@ bookdown::gitbook:
120123
- when saying that students will do things in code, always say "in R"
121124
- "you will be able to" (not "students will be able to", "the reader will be able to")
122125

126+
#### Captions
127+
- captions should be sentence formatted and end with a period
128+
- If you have special characters (particularly underscores, quotation marks, plus signs, other LaTeX math symbols) make sure to separate
129+
the caption out of the code chunk like so
130+
```
131+
(ref:blah)
132+
133+
\`\`\`
134+
{r blah, other_options}
135+
code here
136+
\`\`\`
137+
```
138+
123139
#### Equations
124140
- make sure all equations get capitalized labels ("Equation \\@ref(blah)", not "equation below" or "equation above")
125141

126142
#### Figures
127143
- make sure all figures get (capitalized) labels ("Figure \\@ref(blah)", not "figure below" or "figure above")
128144
- make sure all figures get captions
129145
- specify image widths in terms of linewidth percent (e.g. `out.width="70%"`)
130-
- center align all images
146+
- center align all images via `fig.align = "center"`
131147
- make sure we have permission for every figure/logo that we use
132148
- Make sure all figures follow the visualization principles in Chapter 4
133149
- Make sure axes are set appropriately to not inflate/deflate differences artificially *where it does not compromise clarity* (e.g. in the classification
134150
chapter there are a few examples where zoomed-in accuracy axes are better than using the full range 0 to 1)
151+
-
135152

136153
#### Tables
137154
- make sure all tables get capitalized labels ("Table \\@ref(blah)", not "table below" or "table above")
138155
- make sure all tables get captions
139156
- make sure the row + column spacing is reasonable
157+
- Do not put links in table captions, it breaks pdf rendering
158+
- Do not put underscores in table captions, it breaks pdf rendering
140159

141160
#### Note boxes
142161
- note boxes should be typeset as quote boxes using `>` and start with **Note:**
@@ -178,6 +197,10 @@ Generally the book uses American spelling. Some common British vs American and C
178197
- c vs s: defense (not defence)
179198
- er vs re: center (not centre)
180199

200+
#### Whitespace
201+
We need a line of whitespace before and after code fences (code surrounded by three backticks above and below). This is for readability,
202+
and it is essential for figure captions.
203+
181204
#### PDF Output
182205
These are absolute last steps when rendering the PDF output:
183206
- Look for and fix bad line breaks (e.g. with only one word on the next line, orphans, and widows)
@@ -211,6 +234,3 @@ docker run --rm -it -p 8888:8888 -v $PWD:/home/rstudio/introduction-to-datascien
211234
- `data/` stores datasets processed during compile
212235
- `docs/.nojekyll` tells github's static site builder not to run [Jekyll](https://jekyllrb.com/). This avoids Jekyll deleting the folder `docs/_main_files` (as it starts with an underscore)
213236

214-
## License Information
215-
216-
[Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

_build.R renamed to _build_html.r

File renamed without changes.

_pdfbuild.R

Lines changed: 0 additions & 1 deletion
This file was deleted.

acknowledgements.Rmd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Acknowledgments {-}
2+

authors.Rmd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# About the authors {-}
2+
3+
Tiffany Timbers is an Assistant Professor of Teaching in the Department of Statistics and Co-Director for the Master of Data Science program (Vancouver Option) at the University of British Columbia. In these roles she teaches and develops curriculum around the responsible application of Data Science to solve real-world problems. One of her favorite courses she teaches is a graduate course on collaborative software development, which focuses on teaching how to create R and Python packages using modern tools and workflows.
4+
5+
6+
Trevor Campbell is an Assistant Professor in the Department of Statistics at the University of British Columbia. His research focuses on automated, scalable Bayesian inference algorithms, Bayesian nonparametrics, streaming data, and Bayesian theory. He was previously a postdoctoral associate advised by Tamara Broderick in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute for Data, Systems, and Society (IDSS) at MIT, a Ph.D. candidate under Jonathan How in the Laboratory for Information and Decision Systems (LIDS) at MIT, and before that he was in the Engineering Science program at the University of Toronto.
7+
8+
9+
Melissa Lee is an Assistant Professor of Teaching in the Department of Statistics at the University of British Columbia. She teaches and develops curriculum for undergraduate statistics and data science courses. Her work focuses on student-centered approaches to teaching, developing and assessing open educational resources, and promoting equity, diversity, and inclusion initiatives.

build.sh

Lines changed: 0 additions & 1 deletion
This file was deleted.

build_html.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Script to generate HTML book
2+
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.12.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience; Rscript _build_html.r"

build_pdf.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Script to generate PDF book
2+
3+
# Copy files
4+
cp references.bib pdf/
5+
cp preface-text.Rmd pdf/
6+
cp intro.Rmd pdf/
7+
cp reading.Rmd pdf/
8+
cp wrangling.Rmd pdf/
9+
cp viz.Rmd pdf/
10+
cp classification1.Rmd pdf/
11+
cp classification2.Rmd pdf/
12+
cp regression1.Rmd pdf/
13+
cp regression2.Rmd pdf/
14+
cp clustering.Rmd pdf/
15+
cp inference.Rmd pdf/
16+
cp jupyter.Rmd pdf/
17+
cp version-control.Rmd pdf/
18+
cp setup.Rmd pdf/
19+
cp references.Rmd pdf/
20+
cp printindex.tex pdf/
21+
cp -r data/ pdf/data
22+
cp -r img/ pdf/img
23+
24+
# Build the book with bookdown
25+
docker run --rm -m 5g -v $(pwd):/home/rstudio/introduction-to-datascience ubcdsci/intro-to-ds:v0.12.0 /bin/bash -c "cd /home/rstudio/introduction-to-datascience/pdf; Rscript _build_pdf.r"
26+
27+
# clean files in pdf dir
28+
rm -rf pdf/references.bib
29+
rm -rf pdf/preface-text.Rmd
30+
rm -rf pdf/intro.Rmd
31+
rm -rf pdf/reading.Rmd
32+
rm -rf pdf/wrangling.Rmd
33+
rm -rf pdf/viz.Rmd
34+
rm -rf pdf/classification1.Rmd
35+
rm -rf pdf/classification2.Rmd
36+
rm -rf pdf/regression1.Rmd
37+
rm -rf pdf/regression2.Rmd
38+
rm -rf pdf/clustering.Rmd
39+
rm -rf pdf/inference.Rmd
40+
rm -rf pdf/jupyter.Rmd
41+
rm -rf pdf/version-control.Rmd
42+
rm -rf pdf/setup.Rmd
43+
rm -rf pdf/references.Rmd
44+
rm -rf pdf/printindex.tex
45+
rm -rf pdf/data pdf/img
46+

0 commit comments

Comments
 (0)