Skip to content

Commit 940f870

Browse files
Use new bootstrap 4 style and update images
1 parent eb95f33 commit 940f870

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+22155
-125
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ publish-production: hugo
2626
netlify deploy --prod --dir www/public
2727

2828
book/2e/%.utf8.md: book/2e/%.Rmd
29-
cd book/2e && Rscript --vanilla -e 'bookdown::render_book("$*.Rmd", encoding = "UTF-8", preview = TRUE, clean = FALSE)'
29+
cd book/2e && Rscript --vanilla -e 'bookdown::render_book("$*.Rmd", encoding = "UTF-8", preview = TRUE, clean = FALSE, new_session = TRUE)'
3030

3131
foreword: book/2e/foreword.utf8.md
3232
preface: book/2e/preface.utf8.md

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
Data Science at the Command Line
2-
================================
1+
# Data Science at the Command Line
32

43
[![License: CC BY-ND 4.0](https://img.shields.io/badge/License-CC%20BY--ND%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nd/4.0/)
54

6-
<img src="https://www.datascienceatthecommandline.com/img/cover-readme.png" width="302px" />
5+
<a href="https://datascienceatthecommandline.com/">
6+
<img src="https://datascienceatthecommandline.com/2e/images/cover-small.png" width="224px" /></a>
77

8-
This repository contains the full text, data, scripts, and custom command-line tools used in the book [Data Science at the Command Line](http://datascienceatthecommandline.com). The book is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License.
8+
This repository contains the full text, data, and scripts used in the second edition of the book *Data Science at the Command Line*. The book is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License.
9+
10+
You can read the book for free at https://datascienceatthecommandline.com.

book/2e/01.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ Although the five steps are discussed in a linear and incremental fashion, in pr
4848
\@ref(fig:diagram-osemn) illustrates that doing data science is an iterative and non-linear process.
4949
For example, once you have modeled your data, and you look at the results, you may decide to go back to the scrubbing step to the adjust the features of the dataset.
5050

51-
```{r diagram-osemn, echo=FALSE, fig.cap="Doing data science is an iterative and non-linear process", fig.align="center"}
52-
knitr::include_graphics("images/diagram_osemn.png")
51+
```{r diagram-osemn, echo=FALSE, fig.cap="Doing data science is an iterative and non-linear process", fig.align="center", out.width="90%"}
52+
knitr::include_graphics("images/dscl_0101.png")
5353
```
5454

5555
Below I explain what each step entails.

book/2e/02.Rmd

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ Under the hood, each command-line tool is one of the following five types:
237237
- An alias
238238

239239
```{r umbrella, echo=FALSE, fig.cap="I use the term command-line tool as an umbrella term", fig.align="center"}
240-
knitr::include_graphics("images/diagram_umbrella.png")
240+
knitr::include_graphics("images/dscl_0201.png")
241241
```
242242

243243
It’s good to know the difference between the types.
@@ -338,7 +338,7 @@ Try typing a sentence and press **`Enter`**.
338338
You can stop sending input by pressing **`Ctrl-D`** after which `rev` will stop.
339339

340340
```{r diagram-essential-streams, echo=FALSE, fig.cap="Every tool has three standard streams: standard input (*`stdin`*), standard output (*`stdout`*), and standard error (*`stderr`*)", fig.align="center"}
341-
knitr::include_graphics("images/diagram_essential_streams.png")
341+
knitr::include_graphics("images/dscl_0202.png")
342342
```
343343
344344
In practice, you'll not use the keyboard as a source of input, but the output generated by other tools and the contents of files.
@@ -347,7 +347,7 @@ For example, with `curl` we can download the book *Alice’s Adventures in Wonde
347347
This is done using the pipe operator (`|`).
348348
349349
```{r diagram-essential-pipe, echo=FALSE, fig.cap="The output from a tool can be piped to another tool", fig.align="center"}
350-
knitr::include_graphics("images/diagram_essential_pipe.png")
350+
knitr::include_graphics("images/dscl_0203.png")
351351
```
352352

353353
We can *pipe* the output of `curl` to `grep` to filter lines on a pattern.
@@ -389,7 +389,7 @@ If this file already exists, its contents are overwritten.
389389
Note that the standard error is still redirected to the terminal.
390390

391391
```{r diagram-essential-redirect-stdout, echo=FALSE, fig.cap="The output from a tool can be redirected to a file", fig.align="center"}
392-
knitr::include_graphics("images/diagram_essential_redirect_stdout.png")
392+
knitr::include_graphics("images/dscl_0204.png")
393393
```
394394

395395
You can also append the output to a file with `>>`, meaning the output is added after the original contents:
@@ -422,7 +422,7 @@ This way, you are directly passing the file to the standard input of `wc` withou
422422
Again, the final output is the same.
423423

424424
```{r diagram-essential-stdin-cat, echo=FALSE, fig.cap="Two ways to use the contents of a file as input", fig.align="center"}
425-
knitr::include_graphics("images/diagram_essential_stdin_cat.png")
425+
knitr::include_graphics("images/dscl_0205.png")
426426
```
427427

428428
Like many command-line tools, `wc` allows one or more filenames to be specified as arguments.
@@ -450,19 +450,18 @@ cat movies.txt 404.txt 2> /dev/null
450450
```
451451
<1> The *`2`* refers to standard error.
452452

453-
```{r diagram-essential-redirect-devnull, echo=FALSE, fig.cap="Redirecting *`stderr`* to */dev/null*", fig.align="center"}
454-
knitr::include_graphics("images/diagram_essential_redirect_devnull.png")
453+
```{r diagram-essential-redirect-devnull, echo=FALSE, fig.cap="Redirecting *`stderr`* to */dev/null*", fig.align="center", out.width="50%"}
454+
knitr::include_graphics("images/dscl_0206.png")
455455
```
456456
457-
458457
Be careful not to read from and write to the same file.
459458
If you do, you'll end up with an empty file.
460459
That's because the tool of which the output is redirected, immediately opens that file for writing, and thereby emptying it.
461460
There are two workarounds for this: (1) write to a different file and rename it afterwards with `mv` or (2) use `sponge` [@sponge], which soaks up all its input before writing to a file.
462461
\@ref(fig:diagram-essential-sponge) illustrates how this works.
463462
464463
```{r diagram-essential-sponge, echo=FALSE, fig.cap="Unless you use `sponge`, you cannot read from and write to the same file in one pipeline", fig.align="center"}
465-
knitr::include_graphics("images/diagram_essential_sponge.png")
464+
knitr::include_graphics("images/dscl_0207.png")
466465
```
467466

468467
For example, imagine you have used `dseq`[@dseq] to generate a file *dates.txt* and now you'd like to add line numbers using `nl`[@nl].
@@ -619,7 +618,7 @@ seq 0 2 100 | tee even.txt | trim 5
619618
```
620619

621620
```{r diagram-essential-tee, echo=FALSE, fig.cap="With `tee`, you can write intermediate output to a file", fig.align="center"}
622-
knitr::include_graphics("images/diagram_essential_tee.png")
621+
knitr::include_graphics("images/dscl_0208.png")
623622
```
624623
625624
Lastly, to insert images that have been generated by command-line tools (so every image except screenshots and diagrams) I use `display`.

book/2e/03.Rmd

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ console_start()
1212
rm -f $HISTFILE
1313
```
1414

15+
```{console install_pup_arm64, include=FALSE}
16+
curl -sL https://github.com/ericchiang/pup/releases/download/v0.4.0/pup_v0.4.0_linux_arm64.zip -o pup.zip
17+
unzip pup.zip
18+
sudo mv pup /usr/bin/
19+
rm pup.zip
20+
```
1521

1622
# Obtaining Data {#chapter-3-obtaining-data}
1723

book/2e/05.Rmd

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ console_start()
1212
rm -f $HISTFILE
1313
```
1414

15+
```{console install_pup_arm64, include=FALSE}
16+
curl -sL https://github.com/ericchiang/pup/releases/download/v0.4.0/pup_v0.4.0_linux_arm64.zip -o pup.zip
17+
unzip pup.zip
18+
sudo mv pup /usr/bin/
19+
rm pup.zip
20+
```
21+
1522
# Scrubbing Data {#chapter-5-scrubbing-data}
1623

1724
Two chapters ago, in the first step of the OSEMN model for data science, we looked at *obtaining* data from a variety of sources.

book/2e/06.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ There are five targets: *`all`*, *`data`*, *`data/starwars.csv`*, *`top10`*, and
274274
\@ref(fig:starwars-image) provides an overview of these targets and the dependencies between them.
275275

276276
```{r dependencies, echo=FALSE, fig.cap="Dependencies between targets", fig.align="center"}
277-
knitr::include_graphics("images/diagram_dependencies.png")
277+
knitr::include_graphics("images/dscl_0602.png")
278278
```
279279

280280
Let's discuss each target in turn:

book/2e/07.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,7 @@ Once you've plotted something, you can visit *localhost:8000* in your browser an
462462
The default port is 8000, but you can change this by specifying it as an argument to `servewd`:
463463

464464
```{console}
465-
servewd 9999 > display#!enter=FALSE
465+
servewd 9999#!enter=FALSE
466466
C-C#!literal=FALSE
467467
```
468468

book/2e/08.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -200,8 +200,8 @@ API calls may be limited to a certain number, or some commands can only have one
200200

201201
\@ref(fig:diagram-parallel-processing) illustrates, on a conceptual level, the difference between serial processing, naive parallel processing, and parallel processing with GNU Parallel in terms of the number of concurrent processes and the total amount of time it takes to run everything.
202202

203-
```{r diagram-parallel-processing, echo=FALSE, fig.cap="Serial processing, naive parallel processing, and parallel processing with GNU Parallel", fig.align="center"}
204-
knitr::include_graphics("images/diagram_parallel_processing.png")
203+
```{r diagram-parallel-processing, echo=FALSE, fig.cap="Serial processing, naive parallel processing, and parallel processing with GNU Parallel", fig.align="center", out.width="60%"}
204+
knitr::include_graphics("images/dscl_0801.png")
205205
```
206206

207207
There are two problems with this naive approach.
@@ -234,7 +234,7 @@ This is `parallel` in its simplest form: the items to loop over are passed via s
234234
See \@ref(fig:diagram-parallel-output) for an illustration of how `parallel` concurrently distributes input among processes and collects their outputs.
235235

236236
```{r diagram-parallel-output, echo=FALSE, fig.cap="GNU Parallel concurrently distributes input among processes and collects their outputs", fig.align="center"}
237-
knitr::include_graphics("images/diagram_parallel_output.png")
237+
knitr::include_graphics("images/dscl_0802.png")
238238
```
239239

240240
As you can see it basically acts as a for loop.
@@ -373,7 +373,7 @@ tree outdir | trim
373373
See \@ref(fig:diagram-parallel-results) for a pictorial overview of how the `--results` option works.
374374

375375
```{r diagram-parallel-results, echo=FALSE, fig.cap="GNU Parallel stores output in separate files with the `--results` option", fig.align="center"}
376-
knitr::include_graphics("images/diagram_parallel_results.png")
376+
knitr::include_graphics("images/dscl_0803.png")
377377
```
378378
379379
When you're running multiple jobs in parallel, the order in which the jobs are run may not correspond to the order of the input.

book/2e/12-references.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
`r if (knitr:::is_html_output()) '# References {-}'`

0 commit comments

Comments
 (0)