You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains the full text, data, scripts, and custom command-line tools used in the book [Data Science at the Command Line](http://datascienceatthecommandline.com). The book is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License.
8
+
This repository contains the full text, data, and scripts used in the second edition of the book *Data Science at the Command Line*. The book is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License.
9
+
10
+
You can read the book for free at https://datascienceatthecommandline.com.
Copy file name to clipboardExpand all lines: book/2e/01.Rmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -48,8 +48,8 @@ Although the five steps are discussed in a linear and incremental fashion, in pr
48
48
\@ref(fig:diagram-osemn) illustrates that doing data science is an iterative and non-linear process.
49
49
For example, once you have modeled your data, and you look at the results, you may decide to go back to the scrubbing step to the adjust the features of the dataset.
50
50
51
-
```{r diagram-osemn, echo=FALSE, fig.cap="Doing data science is an iterative and non-linear process", fig.align="center"}
It’s good to know the difference between the types.
@@ -338,7 +338,7 @@ Try typing a sentence and press **`Enter`**.
338
338
You can stop sending input by pressing **`Ctrl-D`** after which `rev` will stop.
339
339
340
340
```{r diagram-essential-streams, echo=FALSE, fig.cap="Every tool has three standard streams: standard input (*`stdin`*), standard output (*`stdout`*), and standard error (*`stderr`*)", fig.align="center"}
```{r diagram-essential-redirect-devnull, echo=FALSE, fig.cap="Redirecting *`stderr`* to */dev/null*", fig.align="center", out.width="50%"}
454
+
knitr::include_graphics("images/dscl_0206.png")
455
455
```
456
456
457
-
458
457
Be careful not to read from and write to the same file.
459
458
If you do, you'll end up with an empty file.
460
459
That's because the tool of which the output is redirected, immediately opens that file for writing, and thereby emptying it.
461
460
There are two workarounds for this: (1) write to a different file and rename it afterwards with `mv` or (2) use `sponge` [@sponge], which soaks up all its input before writing to a file.
462
461
\@ref(fig:diagram-essential-sponge) illustrates how this works.
463
462
464
463
```{r diagram-essential-sponge, echo=FALSE, fig.cap="Unless you use `sponge`, you cannot read from and write to the same file in one pipeline", fig.align="center"}
Copy file name to clipboardExpand all lines: book/2e/08.Rmd
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -200,8 +200,8 @@ API calls may be limited to a certain number, or some commands can only have one
200
200
201
201
\@ref(fig:diagram-parallel-processing) illustrates, on a conceptual level, the difference between serial processing, naive parallel processing, and parallel processing with GNU Parallel in terms of the number of concurrent processes and the total amount of time it takes to run everything.
202
202
203
-
```{r diagram-parallel-processing, echo=FALSE, fig.cap="Serial processing, naive parallel processing, and parallel processing with GNU Parallel", fig.align="center"}
```{r diagram-parallel-processing, echo=FALSE, fig.cap="Serial processing, naive parallel processing, and parallel processing with GNU Parallel", fig.align="center", out.width="60%"}
204
+
knitr::include_graphics("images/dscl_0801.png")
205
205
```
206
206
207
207
There are two problems with this naive approach.
@@ -234,7 +234,7 @@ This is `parallel` in its simplest form: the items to loop over are passed via s
234
234
See \@ref(fig:diagram-parallel-output) for an illustration of how `parallel` concurrently distributes input among processes and collects their outputs.
235
235
236
236
```{r diagram-parallel-output, echo=FALSE, fig.cap="GNU Parallel concurrently distributes input among processes and collects their outputs", fig.align="center"}
0 commit comments