You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: chapters/09-code-workflow-agreements.qmd
+23-23Lines changed: 23 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -361,9 +361,31 @@ Note that this applies only to Jupyter Notebooks. While Quarto uses the Jupyter
361
361
Rendering a Quarto document always runs code from scratch by default.
362
362
:::
363
363
364
+
### Pipelines {#sec-pipelines}
365
+
366
+
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code). They are also very handy for reproducing code, because they only require a command or two to run the entire pipeline.
367
+
368
+
Pipeline tools are helpful for projects of any size, but they are particularly suited to complex or computationally intense projects.
369
+
370
+
::: panel-tabset
371
+
## R
372
+
373
+
The best pipeline tool in R is the targets package. targets is a native R tool, making it easy to work with R objects. It works particularly well with Quarto and R Markdown, allowing you to reduce the amount of code in a report while managing it reproducibly.
374
+
375
+
targets has [excellent documentation and tutorials](https://books.ropensci.org/targets/), so we point you there for guidance.
376
+
377
+
It's also possible to use tools like Make (see the Python tab) among others, with R, although we recommend targets for projects that are mostly R. For projects that are a mix of languages, Make may be a better fit.
378
+
379
+
## Python
380
+
381
+
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
382
+
383
+
For research projects, we recommend [GNU Make](https://www.gnu.org/software/make/). Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
384
+
:::
385
+
364
386
### Provide Guidance on How to Run Your Code
365
387
366
-
Your `README` should include guidance on how to run your code. For instance, if there is a command to run the entire project, include information about that process (this is usually related to pipeline-managed code as discussed in the optional @sec-pipelines). If you intend the user to run scripts in a particular order, describe how.
388
+
Your `README` should include guidance on how to run your code. For instance, if there is a command to run the entire project, include information about that process (this is usually related to pipeline-managed code as discussed in @sec-pipelines). If you intend the user to run scripts in a particular order, describe how, but prefer using a pipeline tool to manage this instead.
367
389
368
390
## Lock your Package Versions {#sec-pkg-env}
369
391
@@ -473,28 +495,6 @@ See the [documentation](https://docs.astral.sh/uv/getting-started/features/) for
473
495
474
496
Opt-in workflows are things we do not require for a project but for which we offer guidance. Such workflows also allow the team to experiment with new things and see what works for projects and when.
475
497
476
-
### Pipelines {#sec-pipelines}
477
-
478
-
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code). They are also very handy for reproducing code, because they only require a command or two to run the entire pipeline.
479
-
480
-
Pipeline tools are helpful for projects of any size, but they are particularly suited to complex or computationally intense projects.
481
-
482
-
::: panel-tabset
483
-
## R
484
-
485
-
The best pipeline tool in R is the targets package. targets is a native R tool, making it easy to work with R objects. It works particularly well with Quarto and R Markdown, allowing you to reduce the amount of code in a report while managing it reproducibly.
486
-
487
-
targets has [excellent documentation and tutorials](https://books.ropensci.org/targets/), so we point you there for guidance.
488
-
489
-
It's also possible to use tools like Make (see the Python tab) among others , with R, although we recommend targets for projects that are mostly R. For projects that are a mix of languages, Make may be a better fit.
490
-
491
-
## Python
492
-
493
-
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
494
-
495
-
For research projects, we recommend [GNU Make](https://www.gnu.org/software/make/). Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
496
-
:::
497
-
498
498
### Testing {#sec-tests}
499
499
500
500
In scientific work, two types of code tests are useful: code expectations and data expectations. Code should *behave* the way you expect, and data should *exist* the way you expect. If that is not the case, you either have identified a problem with your code and data or a problem with your expectations.
0 commit comments